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HUMAN BNA SEQUENCES 

Background of the Invention 

Current methods for testing pharmacological substances rely 
5 on a three-stage testing approach to drug development- First! 
candidate compounds are typically screened in some sort of in 
vitro system-i like inhibition of cancer cell growth- Candidates 
are then tested in an animal model! as a first approximation of 
systemic effects! including efficacy and toxicity. Compounds 

10 that still show promise after these initial in vivo screensi 
finally are tested in humans- Again! human testing typically 
occurs in three phases: toxicity! preliminary efficacy! and 
efficacy- The entire process can take more than a decade and 
cost hundreds of millions of dollars- Aside from the monetary 

15 costs and protracted time scalei moreover! current testing 
regimes waste the lives of countless laboratory animals and 
needlessly endanger the lives of human subjects- 

A need exists! therefore! for more sophisticated drug 
screening techniques that can be done rapidly in vitro* These 

20 screening techniques ideally will be reflective of systemic 

and/or organ-specific responses! so that they provide a reliable 
indicator of action in a human body. Current techniques! 
however! tend to utilize only a single or limited number of 
markers! thus answering only very simple questions that are of 

25 questionable medical import- For example! a typical in vitro 
assay may ask whether a lead compound binds a particular 
receptor! which has been implicated in a certain disorder. It is 
presumed that such binding is indicative of therapeutic 
usefulness! but it does not even purport to address systemic 

30 effects. 

Not only are screening techniques for efficacy inadequate! 
the available toxicity screens likewise are inadequate- 
Toxicity! on a first level! is usually measured by animal 
testing. Aside from the complications related to in vivo versus 
35 in vitro testing! such screens are insufficient because of 

differences in metabolism! uptake! etc-! relative to humans- 
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Thus-, improved methods would be not only be in vi tiro-base'di they 
would also be more "human- 11 

Uith the increasing miniaturization of screening assays and 
the growing availability of targets for pharmaceutical 
5 intervention! there is increasing interest in developing arrays 
containing large numbers of these targets that can be assayed 
simultaneously* If such an array contains a large enough 
population of targets! it can be used to essentially mimic the 
systemic response- In other uordsi the array becomes an in vitro 

10 surrogate for the human body- The more refined the array! the 
more accurate the predictive capability. In theory! an array 
could be constructed that can detect all of the known human 
expression products simul taneously-i thereby! providing a very 
reliable indicator of the human response to a given compound- 

15 These arrays offer advantages over the present in vitro screening 
systems in that they can assay large numbers of responses 
simultaneously- They are superior to animal testing because they 
are more "human 11 andi thusi more predictive of human responses- 
In order to construct such arraysi however-* the field is in 

20 need of further human targets- Advantageously i such targets will 
be provided with additional physiologically relevant information! 
such as whether the target is expressed in a particular tissue 
and whether it is related to a known functional class of targets. 
In this way-i the artisan can focus as neededi for exampler on 

25 tissue-specific effects or target class-specific effects! thereby 
providing information useful in evaluating efficacy and/or 
toxicity. 

In addition to a need for pharmacological screening targetsi 
there is a need for further pharmacological substances. These 
30 substances can be used in the formulation of medicinal 

compositions and in treating a wide variety of disorders- 

The present invention responds to the aforementioned and 
other needs in the field by providing a population of novel 
targets useful! inter alia. ! in the profiling and medicinal 
35 contexts described above. 

Summary of the Invention 
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It is an object of the invention-, therefore! to provide a 
set of human cDNA clones- Further to this object-i the invention 
provides sequences of human cDNA clones that were isolated from 
libraries generated from different human tissues- 
5 It is another object of the invention to provide assemblages 

of targets useful in profiling matrices for screening 
pharmacological test compounds. According to this objects 
assemblages comprising different populations of human nucleic 
acidst proteins and antibodies are provided- In different 

10 embodiments -i cDNA library-specific assemblages and target-family- 
specific targets are provided- 

It is a further object of the invention to provide a 
database of human nucleotide and protein sequences- Further to 
this objecti novel human nucleotide and protein sequences are 

15 provided in electronic form- In one embodiment! one or more of 
these sequences is provided in a searchable database- 
It is still another object of the invention to provide 
biologically active target molecules useful in treating or 
detecting human disorders- Further to this objecti the invention 

20 provides nucleic acid and protein molecules that have the 

capacity to affect disease etiology or symptoms or correlate with 
known disease states- Also further to this objecti a database is 
provided which comprises the disclosed molecules in electronic 
form • 

25 Detailed Description 

The invention results from a need in the art for new human 
nucleic acids and proteins- This need arises in several contexts- 
Firsti there is a need to identify targets for therapeutic 
intervention • Secondi there is a need to identify molecules that 

30 may be adversely affected in a therapeutic context-i thereby 
resulting in toxicity- Knowledge of these molecules will aid in 
the design of new medicaments with enhanced efficacy and decreased 
toxicity- Finally-i the need encompasses human nucleic acids and 
proteins that have medicinal applicability in their own right- 

35 In view of these needsi the present inventors set out to 

isolate and sequence human cDNAs from tissue-specific libraries- 
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In this wayi they represent subsets of molecules likely to be 
targets for therapeutic intervention or for avoiding toxicity- In 
addition-! the inventors divided the molecules into various sub- 
categoriesT based on suspected functionality structural 
5 similarity etc-i which are of interest from a pharmacological 
perspective. 

GENERAL DESCRIPTION OP THE INVENTIVE MOLECULES 

The present invention provides novel polynucleotide molecules 
thatn in some instances-* have similarities with known molecules- 

10 The inventive DNAs were cloned from five different human cDNA 
libraries. In addition to these DNA molecules-i the invention 
provides their protein translations and antibodies derived from 
them- The inventive DNA and protein sequences are show 

individually in the Description of the Sequences. The inventive 

15 nucleic acids also include the complements of the DNA sequences 
provided in the Description of the Sequences as well as their RNA 
counterparts- Methods of producing the molecules also are 
provided- Further the invention provides methods for detecting 
all or part of the molecules and of detecting polynucleotides 

20 encoding all or part of the molecules. 

The inventive molecules derive from five cDNA libraries: 
human fetal brains human fetal kidney^ human melanomas human 
testisi and human amygdala- For convenience-! each sequence bears 
a designation that indicates from which library it is derived- In 

25 particular these designations are: "hfpbr 11 for human fetal brains 
tl hfkd" for human fetal kidney^ ll hmel" for human melanoma^ u htes" 
for human testis^ and u hamy" for human amygdala- The individual 
. libraries were constructed and screened as described below in the 
examples- 

30 The protein and DNA molecules of the invention are variously 

described herein as "target" molecules or "inventive" molecules. 
The sequences and other information pertinent to the nucleic acid 
and protein molecules of the invention are shown below in the 
Description of the Sequences- 

35 

Description of the Sequences 

Key to the Description of the Sequences 
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The desctiptions below provide the coding sequences of the 

inventive cDNAs-i as well as the protein sequences and other 
useful information! as set out herein. 

Grouping 

The clones were assigned to the following sixteen functional 
and/or tissue-derived groups: 

I- Amygdala derived 
S- Cell Cycle 

3- Cell Structure and flotility 

M- Differentiation/Development 

5. Intracellular Transport and Trafficking 

fc>- Melanoma derived 

7- Metabolism 

a- Nucleic Acid Management 

T - Signal Transduction 

ID- Transmembrane Protein 

II- Transcription Factors 
15- Brain derived 

13. Kidney derived 

m . Mammary Carcinoma derived 

IS- Testes derived 

lb. Uterus derived 

Description of Clone Files 

The individual clone files are structured in the same 
pattern- The Sections are separated by paragraphs- 

1. Clone Name 

The clone names are deciphered with reference to the 
following example: 

DKFZphf kd2_3kli wherein the code represents: 

• producer of library ("DKFZ") (for convenience-! this 
reference may be eliminated) 

• a "p" for "plasmid cDNA library" (for convenience! this 
reference may be eliminated) 

• library name (e.g. hfbr = human fetal brains hfkd = 
human fetal kidney; hmel = human melanomas htes = human 
testis; hamy = human amygdala) 

• an underscore ("_") to separate library information 
from plate information 

• plate number (e.g. n 3 n ) 

• plate coordinates (letter firsts e.g. n kl2 n ) 

2 . Group 
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3 . Introduction 

short review of the similar itiesn function of the protein and 
possible applications 

4 • Short Information 
specifications about the cDNA (who sequenced-! completeness of the 
cDNAn similarity-! who sequencedn chromosomal local isation t length 
of cDNA-i localisation of poly A tail and polyadeny lation signal) 

5. cDNA-Sequence 

6. BLASTn Results 

search results of blasting the cDNA sequence against all public 
databases 

7 «, Medline Entries 
information about genes/proteins similar to the novel cDNA (if 
available) 

8. Putative Encoded Protein Information 
specifications about the encoded protein (ORF : length and 
localisation of the reading frame) 

9 . Protein Sequence 

10. BLASTp Results 

search results of blasting the protein sequence against all 
public databases 

11. Pedant Information 

output of fully automated annotation: summarises peptide 
inf ormation-i homologies! patterns as follows: 

the protein =. number of amino acid residues 
weight of the protein 



ELengthJ 

- length of 

omui 

- molecular 
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- isoelectric point 
EH0M0L3 

- shows protein with closest similarity to the cDNA- 
encoded protein 

CFUNCAT3 

- functional information according to a catalogue 
developed by Munich Information center for Protein Sequences 
(MIPS) 

EBL0CKS2 

- Blocks are multiply aligned ungapped segments 
corresponding to the most highly conserved regions of 
proteins- The blocks for the Blocks Database are made 
automatically by looking for the most highly conserved 
regions in groups of proteins documented in the Prosite 
Database- The Prosite pattern for a protein group is not 
used in any way to make the Blocks Database and the pattern 
may or may not be contained in one of the blocks 
representing a group- These blocks are then calibrated 
against the SUISS-PROT database to obtain a measure of the 
chance distribution of matches. It is these calibrated 
blocks that make up the Blocks Database- The UUli) versions of 
the Prosite and SUISS-PROT Databases that are used on this 
server are located at the ExPASy World Wide Web (WWW) 
Molecular Biology Server of the Geneva University Hospital 
and the University of Geneva- World Wide Uleb URL 
http://blocks-fhcrc.org/blocks/about_blocks-html/ is the 
entry point to the database- 

- here Blocks segments found in the analysed protein 
sequences are displayed 

CSC0P3 

Nearly all proteins have structural similarities with 
other proteins and-i in some of these cases-i share a common 
evolutionary origin- The scop database provides a detailed 
and comprehensive description of the structural and 
evolutionary relationships between all proteins whose 
structure is knowni including all entries in Brookhaven 
National Laboratory's Protein Data Bank (PDB) - It is 
available as a set of tightly linked hypertext documents 
which make the large database comprehensible and accessible- 

-7- 
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In additioriT the hypertext pages offer a panoply of 
representations of proteins*, including links to PDB entries-, 
.sequences-, references! images and interactive display 
systems- World Wide Web URL http://scop.mrc- 
lmb.cam-ac.uk/scop/ is the entry point to the database- 
Existing automatic sequence and structure comparison tools 
cannot identify all structural and evolutionary 
relationships between proteins- The scop classification of 
proteins has been constructed manually by visual inspection 
and comparison of structures-, but with the assistance of 
tools to make the task manageable and help provide 
generality. Proteins are classified to reflect both 
structural and evolutionary relatedness. Many levels exist 
in the hierarchy-* but the principal levels are family-, 
superfamily and fold- The exact position of boundaries 
between these levels are to some degree subjective- Scop 
evolutionary classification is generally conservative: where 
any doubt about relatedness existsn we made new divisions at 
the family and superfamily levels- 

- - here SCOPE segments found in the analysed protein 
sequences are displayed 

IEECJ 

ENZYHE is a repository of information relative to the 
nomenclature of enzymes- It is primarily based on the 
recommendations of the Nomenclature Committee of the 
International Union of Biochemistry and Molecular Biology 
(IUBMB) and it describes each type of characterized enzyme 
for which an EC (Enzyme Commission) number has been 
provided. World Wide Web URL http://www-expasy.ch/enzyme/ is 
the entry point to the database* 

- here EC-number and name of enzymes with similarity to 
the analysed protein sequences are displayed 

EPIRKWJ 

- functional information according to the Protein 
Information Resource (PIR) database catalogue developed by 
Munich Information Center for Protein Sequences (MIPS)-, the 
National Biomedical Research Foundation (NBRF) and the 
International Protein Information Database in Japan (JIPID) - 
ESUPFAM3 
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- information according to the Protein Information 
Resource (PIR) database catalogue of protein superf amilies 
developed by Munich Information Center for Protein Sequences 
(MIPS)i the National Biomedical Research Foundation (NBRF ) 

5 and the International Protein Information Database in Japan 

CJIPID)- 
CPR0SITE3) 

please refer to IE - PROSITE Motifs 
EPFAMJ 

10 please refer to 13- PFAM Motifs 

- overall Bdimensional folding information 

- 3D indicates that the proteins is similar to a 
protein of which a 3 dimensional structure is known 

15 - overall structural information 

O 

The last PEDANT-block depicts information about the 
folding structure of the protein generated by PREDATOR- 
PREDATOR is a secondary structure prediction program- It 

20 takes as input a single protein sequence to be predicted and 

can optimally use a set of unaligned sequences as additional 
information to predict the query sequence- The mean 
prediction accuracy of PREDATOR, is bfi/C for a single sequence 
and 75% for a set of related sequences - PREDATOR does not 

25 use multiple sequence alignment- Instead*! it relies on 

careful pairwise local alignments of the sequences in the 
set with the query sequence to be predicted- 

World Wide Deb URL http://www.embl- 
heidelberg.de/argos/predator/predator_info-html is the entry 

30 point to the database. 

- H = helixi E = extended or sheeti = coili T = 
transmembrane! B = beta 

- x indicates a low-complexity region with repeat-like 
structure which is omitted in all BLAST searches 



35 



12. PROSITE Motifs 
PROSITE is a database of protein families and domains- It 
consists of biologically significant sitesi patterns and profiles 
that help to reliably identify to which known protein family (if 

.9. 
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any) a new sequence belongs- Uorld Wide Ueb URL 

http://www.expasy.ch/prosite/ is the entry point to the database- 
A description of the prosite consensus patterns is provided 
herein-i after the description of the individual sequences. 

5 

13. PFAM Motifs 
PFAI1 (protein families) is a large collection of multiple 
sequence alignments and hidden Markov models covering many common 
protein domains- World Uide Ueb URL http://www-sanger-ac.uk/Pfarn/ 
10 is the entry point to the database. 

In the charts below-i the groups of sequences are listedi and 
the description of the individual clones follows- 
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g 
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It contains one leucine zipper. The protein is belived to play a role in fatty 
acid metabolism. It is ubiqitous expressed-, with a slight predominance in uterus-, 
placenta and foreskin. 


|The l-isoaspartyl methyltransfarase (Pimt)i as an example-, is a highly conserved 
enzyme utilising S-adenosylmethionine (Adoflet) to methylate aspartate residues of 
proteins damaged by age-related isomerisation and deamidation- 


.Homology 


similarity to the 1-acyl-glycerol- 
3-phosphate acyltransf erase of Zea 
mais. 


similarity to beta-aspartate 
methyltransferases. 
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DKFZphamyE_lDhl7 



group: signal transduction 

10 DKFZphamyE_10hl7 encodes a novel IAD amino acid protein which 
shows weak similarity to murine had- 

The novel protein contains a Zinc finger motif of the C3HCM type 
(RING finger). The RING-finger domain is involved in mediating 

15 protein-protein interactions- Proteins containing a RING-finger 
are: mammalian V(D)J recombination activating protein (RAGD-i 
mouse rpt-l-i human rfp-i human SB Kd Ro/SS-A protein and others- 
The family of RING finger proteins contains a number of 
oncogenes. For example PMLn a probable transcription factor 

20 BRCAln the mammalian cbl- and bmi-1 proto-oncogenes - 

The new protein can find application in modulating protein- 
protein-interaction and in studying the expression profile of 
amygdala-specific genes- 



25 



30 



35 



weak similarity to had (Mus musculus) 
Sequenced by LNU 
Locus: unknown 
Insert length: 635 bp 

Poly A stretch at pos- 751n polyadeny lation signal at pos- 



1 CACAGAGATC ATTGTCAACC AGGCCTGTGG GGGGGACATG CCTGCCTTGG 

51 AAGGGGCACC CCATACCCCG CCACTGCCAC GGCGGCCCCG TAAGGGAAGC 

ID! TCGGAGCTGG GCTTTCCCCG CGTGGCCCCA GAGGATGAGG TCATTGTGAA 

40 151 TCAGTACGTG ATTCGGCCTG GCCCCTCGGC CTCGGCGGCT TCTTCGGCGG 

BD1 CGGCAGGCGA GCCCCTGGAG TGCCCCACCT GTGGGCACTC CTACAATGTC 

551 ACCCAGCGGA GGCCCCGCGT GCTGTCCTGC CTGCACTCTG TGTGTGAGCA 

301 GTGCCTGCAG ATTCTCTACG AGTCCTGCCC CAAGTACAAG TTCATCTCCT 

351 GCCCCACCTG CCGCCGTGAG ACTGTGCTCT TCACCGACTA CGGCCTGGCC 

45 i*01 GCGCTGGCTG TCAACACGTC CATCCTGAGC CGCCTGCCGC CTGAGGCGCT 

H51 GACGGCCCCA TCCGGGGGTC AGTGGGGGGC TGAGCCCGAG GGCAGCTGCT 

501 ACCAGACCTT CCGGCAGTAC TGTGGGGCCG CGTGCACCTG CCACGTGCGG 

551 AACCCACTGT CCGCCTGCTC CATCATGTAG TAGCGCCTGC CTGCCCGCCA 

bOl CTGCCCGCTG AGCCTCGCTC GCTGCTTCTT CAGGGACCCG GCCCTGCCCT 

50 bSl GCCGCCCGCT GACCCTTCCT TCCCCACCAT GGCTTCCGGC CCCACCCCGA 

701 GTGGCATTGT CGCTGCAGCC AACTTTGCCA TTAAAACTCT TTGCCAAAGT 

751 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

flQl AAAAAAAAAA AAAAGAAAAA AAAAAAAAAA AAAAG 



55 



BLAST Results 
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Medline entries 



No Medline entry 



10 

Peptide information for frame 2 



ORF from 3fl bp to 577 bp; peptide length: IflO 
15 Category: similarity to unknown protein 

Classification: Cellular transport and traffic 
Prosite motifs: PRENYLATION (177-lfiD) 
ZINC_FINGER_C3HC4 (Bl-^D) 

20 

1 MPALEGAPHT PPLPRRPRKG SSELGFPRVA PEDEVIVNtfY VIRPGPSASA 
51 ASSAAAGEPL ECPTCGHSYN VT(2RRPRVLS CLHSVCEflCL (2ILYESCPKY 
101 KFISCPTCRR ETVLFTDYGL AALAVNTSIL SRLPPEALTA PSGGfliiJGAEP 
151 EGSCYC3TFRi2 YCGAACTCHV RNPLSACSIM 

25 



BLASTP hits 

30 No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_10hl7 frame 2 
No Alert BLASTP hits found 

35 

Pedant information for DKFZphamy2__lDhl7-» frame 2 



Report for PKFZphamy2_10hl?.2 

40 

IELENGTHID IfiO 
CMU3 l c mOD-27 
CpU 7.^5 

45 EHOMOLJ TREMBL : AC007727_7 gene: "FflK7.7"; Arabidopsis 

thaliana chromosome 1 BAC FflK7 sequencei complete sequence- 3e-0b 

EBLOCKSJ BLDDfi3^C 
CBLOCKSJ PF0mfe,2A 
50 IEBL0CKS3 PRDD7b3H 

CBL0CKS3 BL0051A Zinc finger-, C3HC4 type-i proteins 
EPR0SITE3 PRENYLATION 1 
0IPROSITE3 ZINC_FINGER_C3HCM 1 

IEPFAMJ Zinc finger-. C3HC4 type (RING finger) 

55 EKIjO Alpha_Beta 

EKLU L0U_C0MPLEXITY 5. 5b Z 
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SE<2 MPALEGAPHTPPLPRRPRKGSSELGFPRVAPEDEVIVNdYVIRPGPSASAASSAAAGEPL 

SEG • xxxxxxxxxx - . • - 

PRD cccccccccccccccccccccccccccccccceeeeeeeeeeecccccchhhhhhhcccc 

5 SE£ ECPTCGHSYNVT(2RRPRVLSCLHSVCE(3CL(2ILYESCPKYK:FISCPTCRRETVLFTDYGL 

SEG 

PRD cccccccccccccccceeeecchhhhhhhhhhhhhccccceeGGcccccceeeeeccccc 

SE(2 AALAVNTSILSRLPPEALTAPSGG(3UGAEPEGSCY(3TFRi3YCGAACTCHVRNPLSACSin 

10 SEG 

PRD cchhhhhhhhhcccccccccccccccccccccccchhhhhhhcceeeecccccceeeccc 



15 Prosite for DKFZphamy2_lDhl7 » 2 

PSDDE^M 177->iai PRENYLATION PD0C0D2bb 

PSOOSlfi fll->^l ZINC_FINGER_C3HCM PDOCQDmi 



20 



Pfam for DKFZphamySJDhl? «■ 2 



25 HMM_NAME Zinc finger-, C3HC** type (RING finger) 

Hnn 

*CPICFcTFt31DyPUPFdePmf11PCgHsFCypCIrrLJ- C 

CP C Y+ +P+ L C+HS C+ C + ++ 

30 C 

duery b2 CPTC GHSYNVTtfRRPRVLSCLHS VCEC2CL- 

<2ILYESCPKYKFISC IDS 

HUM PmC* 
35 PC 

Query IQh PTC IDS 
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5 group: signal transduction 

DKFZphamy2_10p7 encodes a novel lblS amino acid protein with 
similarity to Na+/Ca2+ exchange proteins- 

10 The Transport of Ca2 + from the sarcoplasm into the sarcoplasmic 
reticulum is an essential process in the initiation of muscle 
relaxation- 

In addition-) the novel protein contains a PROSITE multicopper 
oxidase signature- Multicopper oxidases are enzymes that possess 
15 three spectroscopically different copper centers- 

The new protein can find application in modulation of NA+/Ca2+- 
exchange and voltage-dependend processes - 

20 

similarity to Na+/Ca2+ exchange proteins 

ATG in frame 3 is first in clone- 

25 Sequenced by LJ1U 

Locus: unknown 

Insert length: 523b bp 
30 Poly A stretch at pos- 521bi no polyadenylation signal found 



ATACCTGTGC CACTTTGTGC 
TTTTTCAGTG CTTCTGAGGG 
CAGCCCAGCT GTCAACAATT 
CCAGGGTAGC ATCTCTTTTT 
GAGCCTGTGA CAAGGCAAT6 
AAATCTCACA GTGTCTATTC 
GTTTTCTAAT TTCTCTCCTT 
TTGAAAAATC AGCCAACCAT 
AGCACTAAAT GGTGATGCCT 
CCAATACTTC CGAAGATGGC 
ACCTTGGTGG A6CTGATGAT 
GGCAGTCGAA T6GCGTGTTG 
TTATAGGTGC TGGA6AGATT 
ACAGTCATTT TAACCATCTT 
TATCATAGTT AGTTTGGTGT 
GCTCCGACAC TGTTAGAGTG 
ATTGTTAGCT TTCAGACAGC 
AATTTTACAA TTCCATGTGA 
CTGTTAACTG GAAAATTATT 
TTTAGCGGAC AACTTTTCTT 
TGTGCATTTG TTGGATGACA 
TCATTCTGTA TGATGTCAGG 
CTGCTTGATG CTCAAGGAT A 
TGAACCACAT GGAGTTTTAA 
TACAAGAGGC TAACATAACA 
TCTCTAGGAG CTATCAATGT 



1 CGGACGCGTG 
51 CTTAAGGAAC 
35 101 TCCCCAGTGT 
151 CAGACTTCTG 
201 AGTGGTCAGG 
251 GGCCATAATG 
3D1 TTCCTGATGA 
40 351 GAAGTTCACC 
i»01 AGGACAGCCA 
ilSl TTGGAGTGTT 
SD1 TTATTTGTTG 
551 ACACAGGACA 
45 bOl TTG6TG6AAC 
b51 CTGACCTTTG 
7D1 G6ATGACTCT 
751 ACACTGAAGG 
flDl AACATTTTGG 
50 651 TTCCAGATCT 
'JOl TAAGAACTTT 
TSl GGGCAAAATC 
1001 TCCTGAGGGG 
1051 ACATTCCTGA 
55 1101 ACACAAGGAG 
1151 TGCAGCTGTC 
12D1 ATTTT6CTCT 
1251 ATTCAGCTTT 



GGCGGACGCG TGGGCCCTGT 
AAGCTTGCTC AGCGTTTTCA 
TTCTGGATGA CATCAT6GAT 
GACCTACAGG AA A AACATGA 
CTGTGGCTGG GAGTGACTAT 
CAGGAAGGTG ATGAATTCGC 
TTTCCCAGAG ATGGATGAGA 
TCATGAACAT TTCAGCCAGT 
AATATTTCTA CAGTTGTCAT 
TGTGATCTAC AGTATTAGTC 
AAGTTCAGGA GCAGCCCCAA 
GGGGGCAGCT TAGGTCAAGT 
A6CTACTGAA GGTTTAGATT 
CTGAAGGTGA AACCAAAAAG 
GAACCAGAGG ATGACGAAAG 
TGGAAGTAGA ATTTTGCCAA 
CCAATGACAA TGTGGCAGGA 
GTCATAGGTC ATGAAGGAGA 
CCCTGGTCGA GGAA ATGTTA 
TAGAACTCAA TTTTGCTAAC 
TCGTTGAATA CAACATTGTT 
GGAGAAAGAA GTATACCAAG 
TTCCACCAGC CGG AATCGCC 
CTCACAGTAG AAGCCAGTGA 
TTCATCAAGA TTTGTGTT AC 
TCATCAACAG AGAATTTGGA 
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1301 CACATATACC ACGGTTCCTG 
1351 GAAACCTAGC AGAGCCAGAA 
1M01 ATTTTAGAAG AAGGGGAAAC 
1M51 GGATGATGTA CCAGAGCTAG 
5 1SD1 TTGGACTTAC CATGGCTGCT 
1551 GAAGGTTTGA CTGCACAAGT 
IbOl TGTAATTGAA TGGCAACAAA 
lbSl GTTTAACATT GGTAGCCCAG 
1701 TTATTTGTGT ATGCTCAGAA 

10 1751 CTTCACCCCA ATGATTCTTC 
1601 TCAATATCAT GATTCTTGAT 
1651 CAGCTGATTT TAACAAATCC 
nOl AATAGCCTTA ATTATTGTCC 
1=151 CATTTAACAA CAGTGAGCAC 

15 20D1 GTCCAGGAGA GTGTTGCAGT 
2051 ATTGTTTGGA ACAGTGACAG 
2101 CAAATGAATC TAAAGATCTG 
S151 GAAGGTGTTC GATTCAAGGC 
2201 ACCAGAAATG GATGA6TATT 

20 2251 GTGCTAGACT AGGGGTGCAT 
2301 CAGGCCCCTT TGGGGCTATT 
2351 CTCCATAGAC ATCGAAGAAG 
2M01 GAACT AATGG CATTGATTTG 
S^iSl GAAACAGCCT TTGGCATGAG 

25 2501 AAGTTTTTTG GATGAATCAG 
2551 ATTTAATATA TGGTATAATG 
2b01 TGGCAGGGGA TTTTTATTCC 
2b51 AACTTGTGAG GCCTTTAAT A 
2701 ATGAAGAAAG AAATGAAGAA 

30 2751 ACATCT6GAT TTAAATT ATT 
2601 TTCTCAAGTA AGATATTTTA 
2651 CAAGTCAAAG AGATGATTCC 
2*501 GGAAGCTTC6 TGTTGCATCA 
2=151 GGCCTTGTTC AACAAGGGAG 

35 3001 AT6CCAG6CT AAACTCCCTT 
3051 AACTTTCAAG AGGTGCCTGT 
3101 TTCAGCCAAT GATATTTACC 
3151 ATCAGAATTC AATTGATATT 
3201 A6GTATTTTC AGTCTGTAGA 

40 3251 CACACCAGCC TCAGGAATAG 
3301 CTGCTCTTTA CTGCTGGAAT 
3351 GAAGTACCTT CTGCTTATGA 
3M01 TTCAAGCAAG AATTTAAT AG 
3M51 AGCTAGCCTA CATTTCCAGC 

45 3S01 CTGATATTTG AACCTGGTGA 
3551 T6ATGATACA GTTCCAGAAA 
3t>01 ATCCCAAAGG AGGAGCAGAG 
3b51 ATTCTGTCTA ATGATGATGC 
37D1 ATTATATAAG CAAGTGGAAG 

50 3751 ACGTTGAACG CTTAAAAGGA 
3601 GCTGATGGAA GTATTAGTGA 
3651 TACTGAAGGC CAGGTACTGT 
3=101 ATATACCAGA 6TTATCAGAG 
3^51 ACAGAAGGGG TTGAGGACTC 

55 H001 AAGCAAGTCT GTTATAACAA 
1051 TGGGCTGGCG TGCTGCGTCT 
mOl AACACCACCA CTCTTCAGTT 
m51 GGATATTGCC ATTCAGTTGA 



GAATGCTGAG TCTGAAGAAC CAAACAGTAG 
GTTGATTTTG TCCCTATCAT TGGCTTTCTG 
AGCAGCAGCC ATCAACATTA CCATTCTTGA 
AAGAATATTT CCTGGTGAAT TTA ACTTACG 
TCAACTTCAT TTCCTCCCAG ACT AGATTCA 
TATTATTGAT GCCAATGATG GGGCCCGAGG 
GCAGGTTTGA AGTAAATGAA ACCCATGGAA 
AGGAGCAGAG AACCTCTTGG CCATGTTTCC 
TTTGGAAGCA CAAGTGGGGC TGGATTATAT 
ATTTT6CTGA TGGAGAAAGG TATAAAAATG 
GATGACATTC CAGAAGGAGA TGAAAAATTT 
TTCTCCTGGA CTAGAGCT AG GGAAAAAT AC 
TTGCTAATGA TGACGGCCCT GGAGTTCTAT 
TTTTTCCTAA GAGAGCCAAC AGCTCTCTAC 
ATTGTACATT GTTCGGGAAC CTGCACAAGG 
TTCAGTTCAT TGTGACAGAA GTGAATTCCT 
ACTCCTTCCA AAGGCTATAT TGTTTTAGAA 
CCTACAAATA TCTGCCATAT TAGACACGGA 
TT6TTTGCAC CTTGTTTAAT CCAACTGGAG 
GTTCAAACCC TGATAACAGT TTTGCAAAAC 
CAGTATCTCT GCAGTTGAAA ATAGAGCCAC 
CCAATAGGAC CGTGTATTTA AATGTATCTC 
GCTGTGAGTG TGCAGTGGGA GACAGT ATCT 
GGGAATGGAT GTTGTGTTTT CCGTATTTCA 
CTTCTGGCTG GTGTTTCTTT ACTTTGGAAA 
TTAAGAAAAT CATCTGTTAC TGTTTACCGA 
AGTTGAGGAT TTA AATATAG AAAATCCTAA 
TTGGTTTTTC TCCCT ACTTT GTGATT ACTC 
AAGCCTTCTC TTAACAGTGT GTTTACATTC 
CCTGGT ACAA ACAATCATTA TTCTGGAAAG 
CTTCAGACAG CCAAGATTAT TTAATCATTG 
GAATTAACTC AGGTCTTCAG GTGGAATGGA 
AAAACTCCCT GTCCGAGGTG TGCTGACCGT 
GCTCTGTGTT CTTAGCCATT TCCCAGGCTA 
TTATTCAGAT GGTCTGGCAG TGGGTTTATT 
CAGTGGGACA ACAGAAGTTG AGGCTTTGTC 
TAATATTTGC CAAAAATGTC TTTCTAGGAG 
TTCATCTGGG AGATGGGACA GTCTTCCTTC 
TTTTGCTGCT GTT AACAGAA TCCACTCCTT 
CCCACATACT TCTTATTGGC CAAGATATGT 
TCGGAGCGTA ATCAATTCTC TTTTGTTCTG 
TGTGGCTTCT GTTACAGTAA AGTCCCTTAA 
CTCTAGTGGG AGCTCATTCA CATATATATG 
CATTCTGACT TTATTCCTAG TTCAGGTGAA 
GA6AGAAGCT ACA ATAGCAG TAAATATCCT 
AA6AAGAATC CTTCAAAGTT CAACTTAAAA 
ATTGGCATTA ATGATTCTGT AACAATAACC 
CTATGGAATT GTTGCATTTG CTCAGAATTC 
AAATGGAGCA AGATAGCCTA GTAACCTTGA 
ACATATGGCC GTATAACCAT AGCATGGGAA 
TATATTTCCT ACCTCAGGAG TGATTTTATT 
CAACAATCAC TCTAACTATT CTTGCTGATA 
GTTGTGATTG TAACCCTCAC CCGTATCACC 
ATACAAAGGT GCTACTATTG ATCAGGACAG 
CTTTGCCCAA TGACTCACCT TTTGGCTTGG 
GTCTTCATTA GAGTAGCAGA GCCTAAAGAA 
ACAAATAGCT CGAGATA AAG GACTACTTGG 
GAGCTCAACC CAATTTCTTA CT6CATGTCG 
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i*201 ATAATCA AGC TACTGAGAAT GAAGATTATG TATTGCAAGA AACAATAATA 

M251 ATAATGAAAG AAAACATAAA AGAAGCTCAT GCCGAAGTTT CCATTTTGCC 

M3D1 GGATGACCTT CCTGAATTGG AGGAAGGATT TATTGTCACT ATCACTGAGG 

M351 TGAACCTGGT GAACTCTGAC TTCTCTACAG GACAGCCAAG TGTGCGGAGG 

5 MMQ1 CCCGGAATGG AAATAGCTGA GATAATGATA GAAGAA AATG ACGATCCCAG 

m*51 AGGAATTTTT ATGTTTCATG TTACTAGAGG CGCTGGGGAA GTTATTACTG 

M501 CCTATGAGGT GCCTCCACCC TTGAACGTTC TTCAAGTTCC TGTAGTCCGG 

H551 CTGGCTGGAA GCTTTGGGGC AGTAAATGTT TATTGGAAAG CATCACCAGA 

HbOl CAGTGCTGGC CTGGAAGACT TTAAACCATC TCATGGGATT CTTGAATTTG 

10 **b51 CAGATAAACA GGTTACTGCA ATGATAGAAA TCACCATAAT TGATGATGCT 

1*701 GAATTTGAAT TGACAGAGAC GTTCAATATT TCCTTGATCA GTGTTGCTGG 

4751 AGGTGGCAGA CTTGGTGATG ATGTTGTGGT AACTGTTGTT ATTCCACAAA 

MfiOl ATGATTCTCC ATTTGGAGTA TTTGGATTTG AAGAAAAGAC TGTAAGTTAA 

L*flSl ACATATCAGG GGAAAGCCTT GTTTCAGGCT AGCGTTTCAT GTAATTTTGA 

15 M^Ol GTAGAAAGTG TCTCACATTT TTGTTTTGGA AGTCTTGGCC AGGCATGGTG 

in51 GCTCATGCCA GTAATCCCAG CACTTTGGGA GGCCGGAGCG GGCAGATCAC 

5001 GAGGTCAGGA GATTGACACC ATCCTGGCCA ATATGGTTGA ATTCCCGTCT 

5051 CTACTGAAAG TACAAAAATT AGCTGGGCGT GGTGGCACAT GCCTGTATTC 

5101 CCAGATACTT GGGAGGCTGA GGCAGGAGAC TCGCTTGAAC CCAGGAGGCA 

20 5151 GAGGTTGCAG TGAGCTGAGA TCACGCCATT GCACTCCAGC CTGGCGACAT 

5E01 AGAGAGACTC CATCTCAAAA AAAAAAAAAA AAAAAG 



25 



BLAST Results 
No BLAST result 
30 Medline entries 

No Medline entry 
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Peptide information for frame 3 



40 ORF from 0 bp to Mfii*7 bpV peptide length: lblb 
Category: putative protein 

Classification: Cell si gnal ing/communication 
Prosite motifs: MULTIC0PPER_0XI3>ASE1 (151-171) 

45 

1 DAUADAWALY TCATLCLKEtf ACSAFSFFSA SEGPtfCFUMT SWISPAVNNS 
51 DPUTYRKNMT RVASLFSGtfA VAGSDYEPVT RtfUAIMtfEGD EPANLTVSIL 
101 PDDFPEMDES PLISLLEVHL MNISASLKNC PTIGtfPNIST VVIALNGDAF 
151 GVFVIYSISP NTSEDGLFVE V(3Et3P(2TLVE LMIHRTGGSL GflVAVEWRVV 

50 201 GGTATEGLDF IGAGEILTFA EGETKKTVIL TILDDSEPED DESIIVSLVY 
251 TEGGSRILPS SPTVRVNILA NDNVAGIVSF (3TASRSVIGH EGEILdFHVI 
301 RTFPGRGNVT VNIdKIIG(2NL ELNFANFSG(3 LFFPEGSLNT TLFVHLLDDN 
351 IPEEKEVYdV ILYDVRTdGV PPAGIALLDA (3GYAAVLTVE ASJ>EPHGVLN 
M01 FALSSRFVLL <2EANITI<2LF INREPGSLGA INVTYTTVPG MLSLKNdTVG 

55 M51 NLAEPEVDFV PIIGFLILEE GETAAAINIT ILEDDVPELE EYFLVNLTYV 

501 GLTMAASTSF PPRLDSEGLT AflVIIDANDG ARGVIEWtfflS RFEVNETHGS 
551 LTLVAdRSRE PLGHVSLFVY AONLEAtfVGL DYIFTPMILH FADGERYKNV 
bOl NIMILDDDIP EGDEKFOLIL TNPSPGLELG KNTI ALIIVL ANDDGPGVLS 
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bSl FNNSEHFFLR EPTALYVfiES VAVLYIVREP AtJGLFGTVTV (3FIVTEVNSS 

701 NESKDLTPSK GYIVLEEGVR FKALfllSAIL DTEPEMDEYF VCTLFNPTGG 

7S1 ARLGVHVflTL ITVLONflAPL GLFSISAVEN RATSIDIEEA NRTVYLNVSR 

SOI TNGIDLAVSV CJUETVSETAF GMRGIID VVFS VFflSFLDESA S6UCFFTLEN 

5 BS1 LIYGIflLRKS SVTVYRUflGI FIPVEDLNIE NPKTCEAFNI GFSPYFVITH 

101 EERNEEKPSL NSVFTFTSGF KLFLVOTIII LESSflVRYFT SDSQDYLIIA 

151 SflRDDSELTfl VFRUNGGSFV LH<2KLPVRGV LTVALFNKGG SVFLAISCiAN 

10D1 ARLNSLLFRU SGSGFINFOE VPVSGTTEVE ALSSANDIYL IFAKNVFLGD 

1051 C2NSIDIFIWE MG<2SSFRYF<3 SVDFAAVNRI HSFTPASGIA HILLIGflDMS 

10 1101 ALYCWNSERN tJFSFVLEVPS AYDVASVTVK SLNSSKNLI A LVGAHSHIYE 

1151 LAYISSHSDF IPSSGELIFE PGEREATIAV NILDDT VPEK EESFKV(2LKN 

1201 PKGGAEIGIN DSVTITILSN DDAYGIVAFA (3NSLYK<2VEE MECDSLVTLN 

1S51 VERLKGTYGR ITIAUEADGS ISDIFPTSGV ILFTEG<2VLS TITLTILADN 

1301 IPELSEVVIV TLTRITTEGV EDSYKGATID (3DRSKSVITT LPNDSPFGLV 

15 1351 GURAASVFIR VAEPKENTTT LQLCIARDKG LLGDIAIHLR AfiPNFLLHVD 

1M01 NflATENEDYV LdETIIIMKE NIKEAHAEVS ILPDDLPELE EGFIVTITEV 

mSl NLVNSDFSTG (2PSVRRPGME IAEIMIEEND DPRGIFMFHV TRGAGEVITA 

1501 YEVPPPLNVL (3VPVVRLAGS FGAVNVYWKA SPDSAGLEDF KPSHGILEFA 

1551 DKOVTAPIIEI TIIDDAEFEL TETFNISLIS VAGGGRLGDD VVVTVVIPflN 
20 lfe.01 DSPFGVFGFE EKTVS 



25 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for »KFZphamy2_10p? -. frame 3 

30 TREMBL : AF0550flM_l gene: "VLGRl"i product: "very large G-protein 
coupled 

receptor-1"; Homo sapiens very large G-protein coupled receptoi — 
1 

(VLGR1) mRNAi complete cds.-i N = 3-i Score = EflH-i P = L2e-33 

35 

TREMBL : DMAFTflT?^ gene: "Calx"} product: "CALX" } Drosophila 
melanogaster 3Na ( + )-lCa (2+) exchanger (Calx) mRNA -> complete cds.-, 
N - 

1-. Score = 17B-, P = 3-3e-0T 

40 

>TREMBL : AF0550fl4_l gene: "VLGR1"} product: "very large G-protein 
coupled 

receptor-1"} Homo sapiens very large G-protein coupled 
45 receptor-1 (VLGR1) 

mRNA i complete cds- 

Length = 1,^7 



50 



HSPs: 

Score = 2flM (42-b bits) Expect = l-2e-33-, Sum P(3) = L2e-33 
Identities = 112/738 <2b*), Positives = 314/735 (42V.) 



fluery: b7 

55 SG(JAVAGSDYEPVTR(2UAIM(2E6DEF ANLTVSILPDDFPEIIDESFLISLLEVHLflNISAS 12b 

S + G DY + a G + + +SI+ D+ E +E +E+ 

L + 
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Sbjct: 102 SSASPGGVDYI-LHGSTVTFCHGfiNLSFINISIIDDNESEFEEP 

IEILLTGATGG 155 

Query: 127 

5 LKN(3PTIGi2PNISTVVIALNGDAFGVFVIYSISPNTSEDGLFVEV(2E(3P(2TLV-ELniHR IBS 

+G+ +S ++IA + FGV N S + + + 

T++ L++ R 

Sbjct: 15b A VLGRHLVSRIIIAKSDSPFGVIRFL NflSK 

ISIANPNSTHILSLVLER 2D3 

10 

Query: Iflb TGGSLGOVAVEURVVGGTATEGL DFIG-AGEILTFAEGETK- 

KTVILTXXXXXXX 23ft 

TGG LG++ VUV6+EL D +F EGE 

+T+ILT 
15 Sbjct: EDM 

TGGLLGEIflVNWETVGPNSflEALLPflNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEI 2b3 

Query: 23«5 XXXXXXXXXLVYTEGGSRILPSSDTVRVNILANDNVA6IVSF-- 

(2TASRSVIGH EG 212 

20 L +G +++ + V + I + G+V F +T S+ 

EG 

Sbjct: 2bM 

EVEETFIIKLHLVKGEAKLDSRAKDVTLTIflEFGDPNG VVQFAPETLSKKTYSEPLALEG 323 

25 Query: 2«J3 EILflFHVIRTFPGR-GNVTVNUKIIGfl- 
NLELNFANFSGdLFFPEGSLNTTLFVHLLDDN 3SD 

+L +R G G + V U++ + ++ +F + SG +G + 

VHLL D 

Sbjct: 321 

30 PLLITFFVRRVKGTFGEINVYUELSSEFDITEDFLSTSGFFTIADGESEASFDVHLLPDE 363 
Query: 351 

IPEEKEVY(2VILYDVRT<2GVPPAGIALLDA(2GYAAVLTVEAS]>EPHGVLNFAL-SSRFVL MD1 
+PE +E Y + L V G A LD + +V A+D+PHGV 

35 FAL S R + 

Sbjct: 3AM VPEIEEDYVI(2LVSVE-- GGAELDLEKSITWFSVYANDDPHGV — 

FALYSDR<2SI M3N 

fluery: MID LQEANI — TI(2LFINREFGSLGAINVTYTTVPGMLSLKNflT- 
40 VGNLAEPEVDFVPIIGFL 4bb 

L N+ +1(3+ I R G+ G + V K Q V AE + 

L 

Sbjct: M35 LIGflNLIRSIfllNITRLAGTFGDVAVGLRISSDH KEtJPIVTENAERfl— 

L iia2 

45 

fluery: Mb? 

ILEEGETAAAINITILEDDVPELEEYFLVNLTYVGLTIIAASTSFPPRLDSEGLTAflVIID 52b 
++++G T +1 LF+LVL PLE 

+ A V+ 

50 Sbjct: n&3 VVOGATYKVD VVPIKNflVFLSLGSNFTL<JLVTVf1LVGGRFYGMPTIL<2- 
EAKSA-VLPV 54D 

flUQry: 527 ANDGARGVIEU<3iJSRFEV-NETHGSLTLVA<3RSREPLGHVSLFV 

YA<2NLEA(3VGLDY 5S2 
55 + A + ++ + F++ N T G+ ++ R R G +S+ YA 

LE + 

Sbjct: SMI SEKAANS(3VGFESTAF(JL(1NITAGTSHVniSR- 
RGTYGALSVAUTTGYAPGLEIPEFIVV 511 
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fluery: 563 -IFTPMI— 

LHF ADGERYKNVNIMILDDDIPEGDEKFflLILTNPSPGLELGKNTIALIIV hSI 

TP + L F + GE+ K V + P E F L L+ G 

5 + IV 

Sbjct: bOO GNMTPTLGSLSFSHGEflRKGVFLldTFPS — 
PGWPEAFVLHLSGVflSSAPGGAflLRSGFIV b5? 

fluery: b»4D LANDDGPGVLSFN- 
10 NSEHFFLREPTALYVflESVAVLYIVREPAflGLFGTVTVflFIVTEVN b16 

A + GV F+ .+S + + E T + ++ V L+ G + 

T 

Sbjct: b5fi -AEIEPflGVFflFSTSSRNIIVSEDTflll-IRLHVflRLF 

GFHSDLIKVSYflTTAG 70S 

15 

fluery: b'H SSNESKDLTP-SKGYIVLEEGVRFKALfllSAILDTEPEMDEYFVCTL 

FNP 747 

S+ +I> P G + ++ +1+ I D E++E+F L 

F+ 

20 Sbjct: 70 e 5 

SAKPLEDFEPVflNGELFFflKFflTEVDFEITIINDflLSEIEEFFYINLTSVEIRGLflKFDV 7bfl 

Query: 746 TGGARLGVHVfiT-LITVLONflAPLGLFSISAVENR-ATSIDIE — — 
EANRTVYLNVSRT fiOl 

25 RL + +IT+L N G+ IS E A ++» E 1 

YL+ S+T 

Sbjct: ?bT NWSPRLNLDFSVAVITILDNDDLAGI1- 
DISFPETTVAVAVDTTLIPVETESTTYLSTSKT 627 

30 fluery: fiOH NGI 604 

I 

Sbjct: 6E6 TTI 330 

Score = Ebb (31. *5 bits)-. Expect = M-Oe-ES-, Sum P(3) = 4.0e-E5 
35 Identities = 175/706 (EHJC)t Positives = 30b/70fl (43/C) 

fluery: 131 

PTIGflPNISTVVIALNGDAFGVFVIYSISPNTSEDGLFVEVflEflPflTLVELMIHRTGGSL 1"5Q 
P 16 +1 ++I N +A G+ P + EV+E L+ + 

40 + R G+ 

Sbjct: 3T PEIGNISIVRIIIMKNDNAE6II EFDPKYTA FEVEEDVG- 

LII1IPVVRLHGTY TO 

fluery: Ml GflVAVEUR VVGGTATEG- 
45 LDFIGAGEILTFAEGETKKTVILTXXXXXXXXXXXXXXXXLV EMI 

G V ++ +A+ G +D+I G +TF G+ + ++ 

L 

Sbjct: «=J1 

GYVTADFISflSSSASPGGVDYILHGSTVTFflHGflNLSFINISIIDDNESEFEEPIEILLT ISO 

50 

fluery: ESQ YTEGGSRILPSSDT VRVNIL AND N VAGI VSFflTASRS VIGHEGE-- 
ILflFHVIRTFPGRG 307 

GG+ +L R+ I +D+ C++ F S+ I + IL h 

RT G 
55 Sbjct: 151 GATGGA- 

VLGRHLVSRIIIAKSDSPFGVIRFLNflSKISIANPNSTfllLSLVLERTGGLLG EOT 
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Query: 3DA NVTVNUKIIGflN LELN — FAN-FSGflLFFPEGSLNT- 

TLFVHLLDDNIPEEKEVY 35fl 

+ VNU+ +G N L N A+ SG +F EG T+ + + 

E +E + 
5 Sbjct: SID 

EIOVNUETVGPNSflEALLPtf NRDIADPVS6LFYFGEGEGGVRTIILTIYPHEEIEVEETF 2b c J 

fluery: 351 flVILYDVRTCJGVPPAGIALLDAflGYAAVLTVEASDEPHGVLNFA 

LSSRFV---LLCE M12 

10 + L+ V+ G A LD++ LT++ +P+GV+ FA LS + 

L E 

Sbjct: 27D IIKLHLVK 

GEAKLDSRAKl>VTLTIfiEFGDPNGVV(2FAPETLSKKTYSEPLALE 322 
15 (Juery: 413 

ANITIGLFINREFGSLGAINVTYTTVPGNLSLKNflTVGNLAEPEVDFVPIIGFLILEEGE 472 
+ I F+ R 6+ 6 I V + L ++ ++ E DF+ 

GF + +GE 

Sbjct: 323 GPLLITFFVRRVKGTFGEIMVYU ELSSEF— DITE 

20 DFLSTSGFFTIADGE 37D 

(Juery: 473 

TAAAINITILEDDVPELEEYFLVNLTYVGLTMAASTSFPPRLDSEGLTAflVIIDANDGAR 532 
+ A+ ++ +L D+VPE+EE +++ L S LD E 

25 + AND 

Sbjct: 3?1 SEASFDVHLLPDEVPEIEEDYVIdLV 

SVEGGAELDLEKSITUFSVYANDDPH 422 

fluery: S33 GVIEtJfifiSRFEV NETHGSLTLVAfiRSREPLGHVS — 

30 LFVYAt3NLEA(2VGLDYIFTPM 5fl? 

GV R + S+ + R G V+ L + + + E + 

+ + 
Sbjct: M23 

GVFALYSDRflSILIGCNLIRSIOINITRLAGTFGDVA VGLRISSDHKE(2PIVTENAER(3L 482 

35 

finery: S&& ILHFAPGERYKNVNIMILDDDI — PEGDE-KFOLILTNPSPGLELGKNTI — 
-ALIIVLA LM1 

++ DG YK V+++ + + + G QL+ G G TI 

A VL 

40 Sbjct: i»fl3 VVK — DGATYK- 

VDVVPIKNflVFLSLGSNFTLflLVTVMLVGGRFYGMPTILflEAKSAVLP 531 

Query: b42 

NDDGPGVLSFNNSEHFFLREPTALYVi2ESVAVLYIVREPA(2GLFGTVTV<2FIV TE bib 

45 + NS+ F E TA + A ' V +6 +G ++V + 

E 

Sbjct: 540 VSEKAA NSOVGF— 

ESTAFflLIINITAGTSHVrilSRRGTYGALSVAIilTTGYAPGLE 512 

50 fiuery: ^17 

VNSSNESKDLTPSKGYIVLEEGVRFKALtJISAILDTEPEHDEYFVCTLFNPTGGARLGVH 75b 
+ ++TP+ G+ G+K++ P EFVL 

A G 

Sbjct: 513 IPEFIVVGNMTPTLGSLSFSHGEdRKGVFLWTF-- 
55 PSPGUPEAFVLHLSGV<JSSAPGGA<2 bSO 

fluery: 757 VQTLITVLQNflAPLGLFSISAVENRATSIDIEEANRTVYLNVSRTNGI— 
DLAVSVflUET B14 
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+++ V + + P + G+F S +R +1 + E + + L+V R G PL 

+ V + + T 

Sbjct: bSl LRSGFIVAEIE-PriGVF<2FST-SSR-- 
NIIVSEDT(2niRLHV<3RLFGFHSI>L-IKVSYflT 70S 

5 

fluery: fll5 VSETAFGMRGMDVVFS VFfiSFLDE 63B 

++A+ +V+ Ffi F E 
Sbjct: 7Db TAGSAKPLEDFEPV(3NGELFF<2KF<3TE 732 

10 Score = 2Mb (3^.^ bits)-. Expect = H-le-32-, Sum PC3) = 4.1e-32 
Identities = ^2/33fi (27X)-, Positives = 157/33A (4b"/.) 

fluery: 511 PPRLDSEGLTAfl VIIDANDGARGVIEU-- 
(2<3SRFEVNETHGSLTLVA<2RSREPLGHVSLF Sbfi 
15 PP + + + ++II ND A G + IE + + + FEV E 6 + + R 

G + V+ 

Sbjct: 3fi PPEIGNISIV- 

RIIINKNDNAEGIIEFDPKYTAFEVEEDVGLIHIPVVRLHGTYGYVTAD 

20 fiuery: SbT VYA<3NLEA<2VG- 

LDYIFTPMILHFADGERYKNVNIMILDDDIPEGDEKFCLILTNPSPGL b27 

+(3+ A G +DYI + F G+ +NI I + DD+ E +E 

+++LT + G 
Sbjct: ^7 

25 FIS(2SSSASPGGVDYILHGSTVTF(3HG(2NLSFINISIIDI>NESEFEEPIEILLTGATGGA 15b 
Query: b2fl 

ELGKNTIALIIVLANDDGPGVLSFNNSEHFFLREPTALYVt3ESVAVLYIVREPA(3GLFGT bfi7 
LG+ + ++ 11+ +D GV+ FN + P S +L +V E 

30 GL G 

Sbjct: 157 VLGRHLVSRIIIAKSDSPFGVIRFLN(2SKISIANPN- 

STMILSLVLERTGGLLGE 210 

Query: bflfl VTVtfFIVTEVNSSN ESKDLT-PSKGYIVLEEGVR- 

35 FKALfiISAILDTEPEM>EYFV 741 

+ V ■ + NS + ++P+ P G EG + + ++ £ 

E++E F+ 
Sbjct: 211 

I(3VNUETVGPNS(3EALLP(3NRI>IAJ)PVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETFI 270 

40 

<2uery: 742 CTLFNPTGGARLGVHVt3TL-ITVL(3N(3APLGL--FSISAVENRATSII>IE- 
EANRTVYLN 7^7 

L G A+L + + +T+ + PG+F+ ++S+E 

+ 

45 Sbjct: 271 

IKLHLVKGEAKLDSRAKDVTLTI(2EFGDPNGVV(3FAPETLSKKTYSEPLALEGPLLITFF 330 

duery: 7Tfi VSRTNGIDLAVSVcJUETVSETAFGnRGMDVVFSVFtSSFLDESASGLJCFFTL 
S4 fl 

50 V R G + V WE SE F + + FL S SG FFT + 

Sbjct: 331 VRRVKGTFGEIM VYWELSSE FDITEDFL — STSG — FFTI 

3bb 

Score = 24b Ob-I bits)-. Expect = LTe-lT-. Sum P(3) = 1-Te-n 
55 Identities = 87/303 (2fl*>-, Positives = 136/303 (45*) 

Query: llb2 PSSGELIFEPGEREA-TIA VNILDDT VPEKEESFKV0LKNPKGGAEIGIN- 
DSVTITILS 121T 
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P SG F GE TI + I E EE+F ++L KG A++ 

VT+TI 

Sbjct: 23b 

PVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETFIIKLHLVKGEAKLDSRAKDVTLTICE 215 

5 

fluery: 1220 NDDAYGIVAF AONSL 

YKflVEEMEflDSL VTLNVERLKGTYGRITIAWEADGSIS--:- 1272 

D G+V FA +L Y + + E L+T V R+KGT+G I + UE 

Sbjct: 2lb 

10 FGDPNGVVCFAPETLSKKTYSEPLALEGPLLITFFVRRVKGTFGEIMVYIdELSSEFDITE 355 
fluery: 1273 

DIFPTS6VILFTE6(2VLSTITLTILADNIPELSEVVIVTLTRITTEGVEDSYKGATID(2D 1332 
D TSG +G+ ++ + +L D +PE+ E ++ L ++ EG 

15 GA +D + 

Sbjct: 35b DFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDYVK2L--VSVEG 

-GAELDLE MO? 

fluery: 1333 

20 RSKSVITTLPNDSPFGLVGURAASVFIRVAEPKENTTTLlJL(3IARDKGLLGDIAIHLRA<3 1312 

+S + + ND P G+ + I + + ++<2 + I R G 

GD+A+ LR 

Sbjct: HOB KSITWFSVYANDDPHGVFALYSDRflSILIGfl — 
NLIRSIOINITRLAGTFGDVAVGLRIS MbS 

25 

fluery: 1313 PNFLLHVDNfi- 

ATENEDYVLCETIIIMKENIKEAHAEVSILPDPLPELEEGFIVTITEVN 1451 

+ H + TEN E +++K+ VI L F 

+ + V 

30 Sbjct: 4bb SJ> HKEflPIVTENA 

EROL VVKDGATYKVDVVPIKNflVFLSLGSNFTLOLVTVn 517 

fluery: 1452 LVNSDFSTGfiPSV 14b4 
LV F G P++ 
35 Sbjct: 516 LVGGRFY-GMPTI 521 

Score = 24b Ob-I bits)i Expect = l.*)e-11i Sum P(3) = l-le-n 
Identities = a c l/334 (2b*) , Positives = 150/334 (44*) 

40 fluery: 1151 

DFIPSSGELIFEPGEREATIAVNILDDTVPEKEESFKVflLKNPKGGAEIGINDSVTITIL 12ia 
D+I + F+ G+ +1 ++I+DD E EE ++ L GGA +G + 

I I 

Sbjct: no 

45 DYILHGSTVTF<2HG(2NLSFINISIII>DNESEFEEPIEILLTGAT6GAVLGRHLVSRIIIA IbT 
(2uery: 1211 

SNl>DAY6IVAFAi3NSLYIC(3VEEMEt2I>SLVTLNVERLKGTYGRITIAlilEADGSIS 1272 

+D +G++ F S + +++L +ER G G I + UE G 

50 S 

Sbjct: 170 KSDSPFGVIRFLNfiSKIS- 
IANPNSTMILSLVLERTGGLLGEI(3VNblETVGPNS(3EALLP 228 

fluery: 1273 DIF-PTSGVILFTEG(3V- 

55 LSTITLTILADNIPELSEVVIVTLTRITTEGVEDSYKGA 1327 

DI P SG+ F EG+ + TI LTI E+ E 1+ |_ + E 

]>S 
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Sbjct: 221 

(JNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETFIIKLHLVKGEAKLDS 2fiM 

Query i 1328 TIDGDRSKSVITTLPN-DSPFGLVGWRAASVFIRV-AEPK-- 
5 ENTTTL<2L(2IARDKGLLG 13S3 

R+K V T+ P G+V + ++ + +EP E + + 

R KG G 

Sbjct: ES5 

RAKDVTLTIQEFGDPNGVVfiFAPETLSKKTYSEPLALEGPLLITFFVRRVKGTFG 331 

10 

fluery: 13AM 

DIAIHLRAflPNFLLHVDNfl ATENEDYVLflETIIinKENIKEAHAEVSILPDDLPELEEGF 1MM3 
+1 + + F + ED++ + + EA +V 

+LPD++PE+EE + 

15 Sbjct: 3M0 EIMVYWELSSEFDI ■ 

TEDFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDY 311 

(Juery: 1MMM IVTITEVNLVNSDFSTGfiPSVRRPGIIEIAEINIEENDDPRGIFnFHVTR 

IM12 

20 +++V +++I + nddp g+f + R 

Sbjct: 3^2 VlflLVSVE GGAELDLEK SITUFSVYANDDPHGVFALYSDR 

431 

Score = 23? (35-b bits)-. Expect = 1-Me-3M-. Sum P(3) = 1-4e-3M 
25 Identities = lDl/3b? (27V.) -, Positives = lb5/3b? (MM/:) 

Ouery: b7 

SGflAVAGSDYEPVTRflUAiriflEGDEFANLTVSILPDDFPEPlDESFLISLLEVHLMNISAS 12b 
S + G DY + Q G + + +SI+ D+ E +E +E+ 

30 L + 

Sbjct: 102 SSASPGGVDYI-LHGSTVTFflHGflNLSFINISIIDDNESEFEEP 

IEILLTGATGG 1SS 

fiuery: 127 

35 LKNfiPTIGflPNISTVVIALNGDAFGVFVIYSISPNTSEDGLFVEVtJEflPflTLVELfllHRT Iflb 

+G+ +S ++IA + FGV N S+ + 

++ L++ RT 

Sbjct: 15b A- VLGRHLVSRIIIAKSDSPFGVIRFL N<2SKISI— 

ANPNSTHILSLVLERT 2DM 

40 

fiuery: la? GGSLGflVAVEURVVGGTATEGL DFIG-AGEILTFAEGETK- 

KTVILTXXXXXXXX 231 

GG LG++ VUVG+EL 5 +F EGE +T+ILT 

Sbjct: 20S 

45 GGLLGEIflVNWETVGPNSflEALLPflNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEIE 2bM 

fiuery: 2MQ XXXXXXXXLVYTEGGSRILPSSDTVRVNILANDNVAGIVSF-- 
(3TASRSVIGH EGE 213 

L +G +++ + V + I + G+V F +T S+ 

50 EG 

Sbjct: 2bS 

VEETFIIKLHL VKGEAKLDSRAKPVTLTICEFGDPNG VV(2FAPETLSKKTYSEPLALEGP 32M 

fiuery: 21M ILCFHVIRTFPGR-GNVTVNWKIIGfl- 
55 NLELNFANFSGC2LFFPEGSLNTTLFVHLLDDNI 351 

+L +R G G + V W++ + ++ +F + SG +G + 

VHLL D + 
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Sbjct: 3HS 

LLITFFVRRVKGTFGEinVYUELSSEFDITEDFLSTSGFFTIADGESEASFDVHLLPDEV 3S4 
fluery: 352 

5 PEEKEVYflVILYPVRTflGVPPAGIALLDAOGYAAVLTVEASDEPHGVLNFAL-SSRFVLL 410 

PE +E Y + L V G A LD + +V A+D+PHGV FAL 

S R +L 

Sbjcf- 3BS PEIEEDYVIflLVSVE- GGAELDLEKSITWFSVYANDDPHGV-- 

FALYSDRtfSIL 435 

10 

fluery: 411 (2EANI — TIeJLFINREFGSLGAINV 433 

N+ I R G+ G + V 

Sbjct: 43b IGflNLIRSI(2INITRLAGTFGDVAV 4b0 

15 Score = 530 (34-5 bits)-. Expect = 2.3e-14-i Sum P(3) = 2-3e-14 
Identities = ^5/31.8 <2b50i Positives = lb4/3bfl (442) 

Query. 1S4Q EMEC2D- 

SLVTLNVERLKGTYGRITIAIilEADGSISDIFPTSGVILFTEGflVLSTITLTILA 12«!fl 
20 E+E+D L+ + V RL GTYG +T + + S + P GV G 

ST+T 

Sbjct: ?1 EVEEDVGLII1IPVVRLHGTYGYVTADFISI2SSSAS--P-GGVDYILHG 

STVTFflH-G 123 

25 fluery: ISIT DNIPELSE VVIVTLTRITTEGVEDSYKGATIDflDRSKSVITTL — - 
PNDSPFGLVGURAA 1355 

N+ ++ +1 E +E GAT + +++ + 

+DSPFG++ + 
Sbjct: 124 

30 (2NLSFINISIIDDNESEFEEPIEILLTGATGGAVLGRHLVSRIII AKSDSPFGVIRFLNfl 133 

fluery: 135b SVFIRVAEPKENTTTL(3L(2IARDKGLLGDIAIHLRA(2- 
PNFLLHVDN<3ATENEDYVL<3ET 1414 

S I+AP +T LL + R GLLG+I ++ PN + (3 + 

35 D V 

Sbjct: lfi4 SK-ISIANPN- 

STMILSLVLERTGGLLGEIflVNUETVGPNSflEALLPflNRDIADPV — SG 23T 

(Suery: 141S IIIMKENIKEAHAEV- 
40 SILPDDLPELEEGFIVTITEVNLVNSDFSTGI2PSVRRPGNEIAE 1473 

+ E + +1 P + E+EE FI+ +++LV G+ + 

++ 

Sbjct: 240 LFYFGEGEGGVRTIILTIYPHEEIEVEETFII KLHLVK ~ 

GEAKLDSRAKDVT- 2=10 

45 

fluery: 1474 

ItllEENDDPRGIFMFHVTRGAGEVITAYEXXXXXXXXXXXXXXXAGSFGAVNVYIiJKASPD 1S33 
+ I+E DP G+ F + + + G+FG + 

VYU+ S + 
50 Sbjct: 211 

LTI<3EFGDPNGVV<2FAPETLSKKTYSEPLALEGPLLITFFVRRVKGTFGEIt1VYWELSSE 3S0 
Ouery: 1534 

SAGLEDFKPSHGILEF ADKflVT AMIEITIIDDAEFELTETFNISLISVAGGGRLGDDVVV lS^ 
55 EDF + G AD + A ++ ++ J> E+ E + I L+SV 6G 

L + + 
Sbjct: 351 

FDITEDFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDYVIflLVSVEGGAELDLEKSI 410 
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fiuery: 15T4 T-VVIPfiNDSPFGVF lbD7 

T + ND P GVF 
Sbjct: mi TUFSVYANDDPHGVF 455 

5 

Score = 1^0 CSfl-S bits)i Expect = 7.5e-ll-. Sum P(3) = 7.5e-ll 
Identities = 13^/5^1 (2350-, Positives =547/511 (HI'/.) 

fluery: b7 

10 SG(MVAGSDYEPVTR<2UAIM(2EGDEFANLTVSILPDDFPEf1DESFLISLLEVHLF1NISAS 15b 

+G A D+EPV Q+ + D E++E F I+L V 



15 



Sbjct: 7D7 

AGSAKPLEDFEPV<2NGELFF<2KFi2TEVDFEITIIND<2LSEIEEFFYINLTSv*EIRGL(3KF 7bb 



Query. 157 LKN-flPTIGflP-NISTVVIALNGDAFGVFVIY- 
SISPNTSEDGLFVEV<2E(2P<3TLVELni 1B3 

N P + +++ + I ND G+++ + + D +v++ 

T L 
20 Sbjct: 7b7 

DVNUSPRLNLDFSVAVITILDNDDLAGNDISFPETTVAVAVDTTLIPVETESTTY — LST 654 

fluery: 134 HRTGGSLGflVA VEURVVGGTATEGLDFIGAGEILTF — 
AEGETKKTVILTXXXXXXXXXX Ell 
25 +T L V +V T 6+1 +++T ++K + T 

Sbjct: A5S SKTTTIL(3PTNVV-AIV — TEATGVSAIPE- 
KLVTLHGTPAVSEKPDVATVTANVSIHGT fiflO 

fluery: 545 XXXXXXLVYTEGGSRILPSSDTVRVNILANDNVAGIVSF-- 
30 (JTASRSVIGHEGEILflFHV 5^ 

+VY E + + +T V I G VS +T E 

L F 

Sbjct: SSI FSLGPSIVYIEEEMKN- 

GTFNTAEVLIRRTGGFTGNVSITVKTFGERCAfiMEPNALPF-- 13? 

35 

Ouery: 300 

IRTFPGRGNVTVNUKIIGflNLELNFANFSGiJLFFPEGSLNTTLFVHLLDDNIPEEICEVYfl 351 
R G N+T U+ E+F +LF+G + V +LDD+ 

PE +E + 

40 Sbjct: 138 -RGIYGISNLT — WAVE 

EEDFEE(3TLTLIFLDGERERKVSVi2ILDDDEPEG<3EFFY 11D 

Cuery: 3b0 VILYDVRTflGVPPAGIALLD A<2 GYAA — 

VLTVEASDEPHGVLNFALSSRFVL-LflEA 413 
45 V L + P G +++ + G+AA ++ + SD +G++ F+ S+ 

L L+E 

Sbjct: 111 VFLTN- 

P<2G6Af2IVEGKDDTGFAAFAI1VIITGSDLHNGIIGFSEES(3SGLELREG 1044 

50 fluery: 414 NITIflLFI— NREFGSLGAI- 

NVTYTTVPGflLSLKNUTVGNLAEPEVDFVPIIGFL 4bb 

+ +L + NR F + VT ++ L+ V NL E E+ 

V G 

Sbjct: 1045 AVMRRLHLIVTRtJPNRAFEDVKVFIilRVTLNKT — VVVL(3KDGV-NLI1E- 
55 EL(2SVS — GTT lOlfl 

(2uery: 4b7 ILEEGETAAAINITILEDDVPELEEYFLVNL — 
TYVGLTMAASTSFPPRLDSEGLTAtfVI 524 

-38- 
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G+T I+I + + VP++E YF V L G + S F 

E +0 + 
Sbjct: 10=1=1 

TCTnGfiTKCFISIELKPEKVPfiVEVYFFVELYEATAGAAINNSARFAQIKILESDESflSL 1156 

5 

(3uery: S25 IDANDGARGVIEWGflSRF EVNETHGS- 

LTLVA(2RSREPLGHVSLFVYA£2NLEA<3VGL SfiD 

+ + G+R + +++ +V G+ L + S + L 

A G 

10 Sbjct: 1151 

VYFSVGSRLAVAHKKATLISLQVARDSGTGLMMSVNFSTflELRSAETIGRTIISPAISGK 1215 

(Juery: 5fil DYIFTPMILHFADGERYKNVNIIIILDD — 
DIPEGDEKFflLILTNPSPGLELGKNTIALII fa3B 
15 T>++ T L F G+R +++++ + + + + Fi3 + + L +P G + 

K I 
Sbjct: 1211 

DFVITEGTLVFEPGflRSTVLDVILTPETGSLNSFPKRFfllVLFDPKGGARIDKVYGTANI 1275 

20 fluery: b31 VLAND-DGPGVLSFNNSEH L-5b 

L +J> J> + + H 
Sbjct: 127=1 TLVSD ADSflAIWGLADflLH 12=17 

Score = 155 (2S.2 bits)-. Expect = L2e-33-. Sum P(3) = l-2e-33 
25 Identities = A'4/32 e i C25Z)-. Positives = mb/321 m**'/.) 

fluery: 112fc» SVTVKSLNS 

SKNLIALVGAHSHIYELAYISSHSDFIPSSGELIFEP6EREATIA V HAD 

S+TVK+ N + G+IL+ DF + LIF 

30 GERE ++V 

Sbjct: 117 SITVKTFGERCAtJMEPNALPFRGIYG- 
ISNLTWAVEEEDFEEflTLTLIFLDGERERKVSV =175 

fluery: 1151 NILDDTVPEKEESFKVCLKNPKG6AEI-- GINDS VTITILSNDDAY- 

35 GIVAFAflNS 1B33 

ILDD PE +E F V L NP+GGA+I G +J>+ + I++ » + 

GI+ F++ S 
Sbjct: 17b 

(2ILDDDEPEG(2EFFYVFLTNPflGGA(3IVEGKDDTGFAAFAMVIITGSDLHNGII6FSEES 1035 

40 

Ouery: 123M LYKflVEEMEflDSLVT LNVERLKG-TYGRITIAUEAD- 

GSISDIFPTSGVILFTEGflV 1280 

+ E+ + +++ LVR + gv 

LEU 
45 Sbjct: lD3b — 

HJSGLELREGAVriRRLHLIVTRflPNRAFEDVKVFIiJRVTLNKTV VVLt2KD6VNLf1EEL(2S 10=13 

tfuery: 123=1 LSTITLTILADNIPELS-EVVIVTLTRITTEGVEDSYK 

GATIDflDRSKSVITTLPNl) 13M1 
50 +S T + +S E+ + ++ + Y+ GA 1+ + 

I L +D 
Sbjct: 10m 

VSGTTTCTIIGflTKCFISIELKPEKVPfiVEVYFFVELYEATAGAAINNSARFACIKILESD 1153 

55 (Juery: 1345 SPFGLVGURAASVFIRVAEPKENTTTL(2L(2IARDKG — LLGDIAI 

HLRA(2PNFLLHV 131=1 

LV + S R + A + T + L<2+ARD G L+ + LR+ 

+ 
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Sbjct: 1154 ESOSLVYFSVGS 

RLAVAHKKATLISLfiVARDSGTGLfltlSVNFSTflELRSAETIGRTI 1210 

(Juery: 140 D DNfiATENEDYVLflETIIIflKENIKEAHAEVSILPD 1434 

+ A +P+V+ E ++ + + +V + P+ 
Sbjct: 1211 ISPAISGKDFVITEGTLVFEPGCJRSTVLDVILTPE 1245 

Score = Ifib (27.1 bits)i Expect = 5.5e-13i Sum P(3) = 2-5e-13 
Identities = 75/242 (30*)-. Positives = 113/242 (4bX) 



duery: 120b 

EIGINPSVTITILSNDDAYGIVAFA<2NSLYK<2VEEME<2DSLVTLNVERLKGTYGRITIAIJ 12b5 
EIG V I 1+ ND+A GI+ F+Y EE L++VRL 

GTYG +T + 

15 Sbjct: 4D EIGNISIVRIIIMKNDNAEGIIEF — 
DPKYTAFEVEEDVGLIMIPVVRLH6TYGYVTADF 17 

fiuery: 12bb E ADGSIS 

DIFPTSGVILFTEGflVLSTITLTILADNIPELSEVVIVTLTRITTEGV 1320 
20 + S + 3> + F fifl LS I ++I+ DN E E + + 

LT T G 
Sbjct: Ifl 

ISflSSSASPGGVDYILHGSTVTFOHGflNLSFINISIIDDNESEFEEPIEILLTGAT — G- 154 
25 (Juery: 1321 

EDSYKGATID<2DRSKSVITTLPNDSPFGLVGURAASVFIRVAEPKENTTTL<3Lc3IAR]>KG 1330 

GA + + +1 +DSPFG++ + S I +A P +T L 

L + R G 

Sbjct : 155 GAVLGRHLvSRIIIA-KSDSPFGVIRFLNflSK-ISIANPN- 

30 STMILSLVLERTGG 20b 

fluery: 13B1 LLGDIAIHLRA(2-PNFLLWVDN(2ATENEDYVL(2ETIIIMKENIKEAHAEV- 
SILPDDLPE 1438 

LLG+I ++ PN + C + + E + 

35 +1 P + E 

Sbjct: 207 LLGEIflVNUETVGPNSflEALLPflNRDIADPV-- 
SGLFYFGEGEGGVRTIILTIYPHEEIE 2L4 

Query- 1431 LEEGFIVTI 1447 
40 +EE FI+ + 

Sbjct: 2b5 VEETFIIKL 273 



Score = 17=} (2b- c ! bits)-i Expect = 1.4e-34-. Sum P(3) = 1.4e-34 
Identities = b5/244 <2b*)-» Positives = 114/244 <4bV.) 



fiuery: 551 DYIFTPMILHFADGERYKNVNIHILDDDIPEGDEKFfiLILTNPSPGLEL — 
GKN T b33 

P+ + L F DGER + V++ ILDDD PEG E F + LTNP G ++ 

GK + 

50 Sbjct: =154 

»FEE(3TLTLIFLPGERERK:VSV(3ILDI>PEPEG(2EFFYVFLTNP(3GGAiS!IVEGKI>]>TGFAA 1013 

fiuery: fe.34 IALIIVLANDDGPGVLSFNNSEHFFLREPTALYVflESVAVLYIVREPAdG — 
LFGTV baa 

55 A++I+ + D G++ F+ L ++ L + R+P + 

+F V 

Sbjct: 1014 FAMVIITGSDLHNGIIGFSEESGSGLELREGAVMRR — 
LHLIVTRQPNRAFEDVKVFblRV 1071 
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Query' bfi*? TV<2 — 

FIVTEVNSSNESKDLTPSK6YIVLEEGVRFKAL(3rSAILPTEPEni>EYFVCTLFN 74b 

T+ +V + + N ++L G G + I + P +++ 

5 YF L+ 

Sbjct: 1072 

TLNKTVVVL(2KDGVNLnEEL(3SVSGTTTCTnG(3TKCFISIELKPEKVPt3VEVYFFVELYE 1131 

fluery: 747 PTGGARLGVHVtf- 
10 TLITVL(3N(3APLGLFSISAVEIMRATSI])IEEANRTVYLNVSRTNGII> 60S 

T GA + + I +L++ L S V +R ++ ++A + L 

V+R +G 

Sbjct: 1132 ATAGAAINNSARFA<2IKILES1>ES(2SLVYFS-VGSRL-AVAHKKAT- 
LISLGVARDSGTG llfifi 

15 

<3uery: flOb LAVSVdbJET fl!4 

L + SV + T 
Sbjct: HflT LMnSVIMFST in? 

20 Score = 174 (2b. 1 bits)-, Expect = 4.1e-32-» Sum P(3) = 4-le-32 
Identities = 5fi/200 (2^*) Positives = 1O2/2D0 (51*) 

(3uery: 115*! 

DFIPSSGELIFEPGEREATIAVNILDDTVPEKEESFKV(3LKNPKGGAEIGINI>SVT-ITI 1217 
25 DF+ +SG GE EA+ V++L D VPE EE + +I2L + +GGAE+ + 

S+T ++ 
Sbjct: 35b 

DFLSTSGFFTIADGESEASFJ>VHLLPI>EVPEIEEDYVIt3LVSVEGGAELDLEKSITIjFSV 415 

30 (3uery: 1216 LSNDDAYGIVAFAdNSLYKtJVEEMEdDSL — 
VTLNVERLKGTYGRITIAUEADGSISDIF 1275 

+N3>3> + G+ A + +& + (2+ + + +N+ RL GT + G + + 

SD 

Sbjct: 41b YANDDPHGVFALYSD 

35 R(3SILIG<2NLIRSI(3INITRLAGTFGI>VAVGLRIS SDHK 4bT 

duery: 127b PTSGVILFTEGdVLSTITLTILADNIPELSEVVI 

VTLTRITTEGVEDSYKGA-TI 132T 

V E (2++ T D +P + + V + TL +T V 

40 + G TI 

Sbjct: 470 

E(3PIVTENAERfiLVVKPGATYKVDVVPIKN(3VFLSLGSNFTLi3LVTVriLVGGRFYGI1PTI 52^ 

duery: 1330 DtfDRSKSVITTLPNDSPFGLVGURAAS 135b 
45 (3+ + KS + + + VG+ + + 

Sbjct: 530 L(2E-AKSA VLPVSEKAANSGVGFESTA 555 

Score = 145 (21.fi bits) i Expect = 4-3e-24i Sum P(3) = 4-3e-24 
Identities = 104/3% (2bX)-» Positives = 170/3% (42*) 

50 

tfuery: fifi 

EGDEFANLTVSILPDDFPEriDESFLISLLEVHLMNISASLKNflPTIG(3PNISTVVIALNG 147 
+G+ A+ V + LPD + PE++E ++I L+ V A L + +1 + 

+ N 

55 Sbjct: 3bfi DGESEASFDVHLLPDEVPEIEEDYVK2LVSVEG GAELDLEKSI 

TUFSVYAND m«J 
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Query: 14fl 

DAFGVFVIYSISPNTSEDGLFVEVflEQPQTLVELniHRTGGSLGQVAVEWRVVGGTATEG 207 
D GVF + YS D + + + +++ I R G+ G VAV R+ 

+ 

5 Sbjct: 420 DPHGVFALYS 

DRQSILIGQNLIRSIQINITRLAGTFGDVAVGLRISSDHKEQP 472 

Query : 20fl LDFIGAGEILTFAEGETKKTVILTXXXXXXXXXXXXXXXXLVYTE-GGSRI- 
-LPSS-DT 2b3 

10 + A L +G T K ++ LV G R 

+P+ 

Sbjct: 473 

IVTENAERQLVVKDGATYKVDVVPIKNQVFLSLGSNFTLQLVTVMLVGGRFYGnPTILQE S32 

15 Query: 2b4 VRVNIL-ANDNVAGI-VSFQTASRSVIGHEGEILQFHVIRTFPGR- 
GNVTVNWKI-IGQN 311 

+ +L. ++ A V F++ + ++ HV+ + G G ++V 

U 

Sbjct: S33 AKSAVLPVSEKAANSQVGFESTAFQLMNITAGTS — 
20 HVMISRRGTY6ALSVAUTTGYAPG 5^D r 

(Juery: 320 LEL — 

NFANFSGQLFFPEGSLNTTLFVHLLDDNIPEEKEVYQVILYDVRTQGVPP 372 

LE+ N G L F G +F+ P E + + L 

25 V++ P 

Sbjct: 511 LEIPEFIVVGNMTPTLGSLSFSHGEQRKGVFLUTFPS-- 
PGUPEAFVLHLSGVQSSA--P b4b 

Query: 373 

30 AGIALLDAQGYAAVLTVEASDEPHGVLNFALSSRFVLLQEANITIQLFINREFG-SLGAI 431 

G L G+ + A EP GV F+ SSR +++ E I+L + R 

FG I 

Sbjct: b47 GGAQL — RSGF 

IVAEIEPMGVFQFSTSSRNIIVSEDTQniRLHVQRLFGFHSDLI fan 

35 

Query: 432 NVTYTTVPGPILS-LKN-QTV — GNLA EPEVDF- 

VPIIGFLILEEGETAAAINITIL 432 

V + Y T G L++ + V G L + EVJ>F + II LEE 

IN+T + 

40 Sbjct: 700 KVSYQTTAGSAKPLEDFEPVQNGELFFQKFQTEVDFEITIINDQ- 
LSEIEEFFYINLTSV 7S6 

Query: 463 E 463 
E 

45 Sbjct: 751 E 75=1 

Score = 142 (21.3 bits) Expect = S.be-OS, Sum P(3) = S-be-OS 
Identities = £4/175 (30Z)s Positives = 7b/175 (43X) 

50 Query: 1435 

DLPELEEGFIVTITEVNLVNSDFSTGflPSVRRPGtlEIAEiniEENBDPRGIFMFHVTRGA 14«14 
J)L + G+ TI E N + J> QP +1 I+I +ND+ GI 

F 

Sbjct: lb DLYDFGRGYDFTIQE-NGLQI]) QPP- 

55 EIGNISIVRIIItlKNDNAEGIIEFDPK bb 

Query: 1415 GEVITAYEXXXXXXXXXXXXXXXAGSFGAVNVYU— 
KASPDSAGLEDFKPSHGILEFADK 1552 
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TA + E G++G V + ++S S G D+ 

+ F 

Sbjct: b? — - 

YTAFEVEEDVGLIMIPVVRLHGTYGYVTADFISflSSSASPGGVDYILHGSTVTFflHG 123 

5 

Query: 1S53 

aVTAIIIEITIIDDAEFELTETFNISLISVAGGGRLGDBVVVTVVIPflNDSPFGVFGF lbOl 
(3+1 I+IIDD EE E I L GG LG +V ++I 

++DSPFGV F 
10 Sbjct: 124 

dNLSFINISIIDDNESEFEEPIEILLTGATGGAVLGRHLVSRIIIAKSDSPFGVIRF l&O 

Score = 125 <lfl.fi bits)-, Expect = M-Oe-25-. Sum P<3) = M.Oe-25 
Identities = 77/3DA (25k)-. Positives = 134/30A (43*) 

15 

fluery: 1141 LVGAHSHIYELAYISSHS ©FIP- 

SSGELIFEPGEREATIAVNILDDTVPEKEES 1113 

L G HS + +++Y ++ »F P +GEL F+ + E + I++J> 

+ E EE 
20 Sbjct: bll 

LFGFHSDLIKVSY<2TTAGSAKPLEI>FEPV(3NGELFFl2KF(JTEVI>FEITIINI>(2LSEIEEF 7S0 

fluery: 11^1 FKVflLKNP— 

KGGAEIGINI>SVTITILSNI>l>AYGIVAFAflNSLYK(2VEEriE<3]>SLVTLNV 12S1 
25 F+L+ +G++NS++ D+++N 

D L ++ + 

Sbjct: 751 FYINLTSVEIRGLOKFDVNUSPRLNL DFSVAVITILDN 

DDLAGMDI 7=ib 

30 (Juery: 1252 

ERLKGTYGRITIAWEADGSISDIFPTSGVILFTEGCVLSTITLTILADNIPELSEVVIVT 1311 

++ T+A D+++ S LT + ++ T+ +E 

+ V + 

Sbjct: 7=17 SFPETTVAVAVDTTLIPVETESTTYLSTS- 

35 KTTTILflPTNVVAIVTEATGVSAIP 650 

Query: 1312 

LTRITTEGVEDSYKGATIDflDRSKSVITTLPNDSPFGLVGWRAASVFIRVAEPKENT-TT 1370 

+TG T VTNSG + V+I E 

40 K T T 

Sbjct: flSl EKLVTLHG TPAVSEKPDVATVTANVSIHGTFSLGPSIVYIE- 

EEHKNGTFNT =)□! 

fiuery: 1371 LflLfilARDKGLLGDIAIHLRA (3PNFL LHVDNO — 

45 ATENEDYVLCJETI 1415 

+ + I R G G+++I ++ +PN L + N A E 

ED+ a 

Sbjct: 102 

AEVLIRRTGGFTGNVSITVKTFGERCA£MEPNALPFRGIYGISNLTUAVEEEDFEEl3TLT "Ibl 

50 

Query'. 141b IIMKENIKEAHAEVSILPDDLPELEEGFIVTIT 144S 

+1 + +E V IL DD PE +E F V +T 
Sbjct: e ib2 LIFLDGERERKVSVfllLDDDEPEGflEFFYVFLT IIH 

55 Score = 123 (lfl-5 bits)i Expect = b.0e-2fl-. Sum P(3) = b.Qe-26 
Identities = 11/372 <24S> Positives = 150/372 (40*) 
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fluery: 3&b VLTVEASDEPHGVLNFALSSRFVLLfiEA — NITI 

(3LFINREFGSLGAINVTYTTV — M3fi 

V TV A+ HG F+L V ++E NT ++ I R G G 

+++T T 

5 Sbjct: flbS VATVTANVSIHGT 

FSLGPSIVYIEEENKNGTFNTAEVLIRRTGGFTGNVSITVKTFGE 125 

Guery: M31 PGtlLSLKN-flTVGNL — 

AEPEVDFVPIIGFLILEEGETAAAINITILEDDVPEL Mfi 1 ) 
10 P L + + NL A E DF LI +GE +++ 

IL+DD PE 
Sbjct: 

RCA(2HEPNALPFRGIY6ISNLTIilAVEEEI>FEE<2TLTLIFLDGERERKVSV(3ILDDDEPEG IfiS 

15 fluery: M^O EEYFLVNLTYVGLTMAASTSFPPRLDSEGLTA — (2VIIDANDGARGVI 

EUdflSRFEV 5MH 

+E+F V LT J> G A VII +D G+I E 

<3S E+ 

Sbjct: lay (3EFFYVFLT 

20 NP<3GGA<2IVEGKDDTGFAAFAMVIITGSDLHNGIIGFSEES(JSGLEL 1DH1 

Query: SMS NE— THGSLTLVAURS-REPLGHVSLF — 
VYAflNLEAflVGLDYIFTPMILHFADGERYKN S=n 

E LL+R V+FV +D+ L 

25 G 

Sbjct: 1DM2 

REGAVnRRLHLIVTRflPNRAFEDVKVFURVTLNKTVVVLflKDGVNLMEELflSVSGTTTCT 1101 

fluery: bOO VNIMILDDDIPEGDEKFflLILTNPSPGLELGKNT- 

30 IALIIVLANDDGPGVLSF bSl 

++I + + +P+ +F+L +G++ AI+L 

+D+ ++ F 
Sbjct: 110S 

HG<3TKCFISIELKPEKVP<3VEVYFFVELYEATAGAAINNSARFA<2IKILESI>ESflSLVYF llbl 

35 

(Juery: bS2 NNSEHFFLREPTALYVt2ESVAVLYIVREPA(2GLFGTVTV<2FIVTEVNSSNE- 
-SKDLTPS 

+ + A + L + R+ GL ++V F E+ S + 

++P+ 

40 Sbjct: HbS SVGSRLAVAHKKATLIS LflVARDSGTGLtl— 

HSVNFST<2ELRSAETIGRTIISPA 1214 

fluery: 71D KGYIVLEEGVRFKALUISAILD 731 

K +++ E + F+ Q S +LD 
45 Sbjct: 1215 ISGKDFVITEGTLVFEPGflRSTVLD 1H3T 

Score = ISO (lfl-0 bits)-. Expect = l-ae-22-i Sum P(3) = l.fle-22 
Identities = 7?/31b (5*450 •, Positives = 127/31b (MO*) 

50 fluery: 12SS KGTYGRITIAWE ADGS 

ISDIFPTSGVILFTEGflVLSTITLTILADNIPEL 130M 

+GTYG +++AW AG + ++ PT G + F+ G+ + L 

P 

Sbjct: S73 

55 RGTYGALSVAlilTTGYAPGLEIPEFIVVGNflTPTLGSLSFSHGEdRKGVFLWTFPS — PGU b30 
fluery: 130S 

SEVVIVTLTRITTEGVEDSYKGATIDflDRSKSVITTLPNDSPFGLVGURAASVFIRVAEP 13bM 
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E ++ L+ GV+ S G <2 RS ++ + P G + + +S 

I V+E 

Sbjct: b31 PEAFVLHLS GVflSSAPGGA — OLRSGFIVAEI 

EPMGVFtJFSTSSRNIIVSE- b?") 

5 

Query: 13bS KENTTTLflLaiARDKGLLGBIAIHLRAOPNFLLHVDNCSATENEPYV- 
LtJETIIIMKENIK 1M23 

+ T ++L + RGD+I+U A ED+ +<2 

+ ++ 

10 Sbjct: b6D --DTQMIRLHVflRLFGFHSDL-IKVSYOTTA 

GSAKPLEDFEPVONGELFFCKFflT 731 

(Juery: 1424 EAHAEVSILPDDLPELEEGFIVTITEVNLVN- 
SDFSTGflPSVRRPGMEIAEItllEENDDP 1462 
15 E E++I+ P L E+EE F + +T V + F +A I 

I +NDD 

Sbjct: 732 

EVDFEITIINDflLSEIEEFFYINLTSVEIRGLflKFDVNUSPRLNLDFSVA VITILDNDPL 711 

20 <2uery: 1463 RGI-FMFHVTRGAGEVITAY 

EXXXXXXXXXXXXXXXAGSFGAVNV YWKASPDSAGLE 153S 

G+FT AVT E V + 

+A+ SA E 
Sbjct: 712 

25 AGtlDISFPETTVAVAVDTTLIPVETESTTYLSTSKTTTILfiPTNVVAIVTEATGVSAIPE 651 

fluery: 153=5 DFKPSHGILEFADKfiVTAniEITIIDDAEFEL 1S7D 

HG ++K A + + ' F L 
Sbjct: 655 KLVTLHGTPAVSEKPDVATVT ANVSIHGTFSL 663 



30 



Score = 113 (17. □ bits), Expect = 1-4e-34i Sum P(3) = 1-4e-34 
Identities = 56/67 (355c) n Positives = 50/67 (57M) 



fluery: 115b SHSDFIPSS6ELIFEPGEREATIAVNILDDT — 
35 VPEKEESFKVfiLKNPKGGAEIG-INDS 1215 

S DF+ + G L+FEPG+R + V + +T + + F++ L +PKGGA 

I + + 
Sbjct: 121b 

SGKDFVITEGTLVFEPGflRSTVLDVILTPETGSLNSFPKRFfllVLFDPKGGARIDKVYGT 1275 

40 

fluery: 1213 VTITILSNDDAYGIVAFAc3NSLYK(2VEE 12M0 

IT++S+ »+ I A + L++ V + 
Sbjct: 127b ANITLVS]>ADS<2AIIi)GLA-D<3LHi2PVND 13D2 

45 Score = 13 (14.0 bits)-i Expect = 4.1e-32i Sum P(3) = M-le-32 
Identities = 57/222 (252) -, Positives = 10/222 (40*) 

fluery: 1404 TENEDYVL--<2ETIIiriKENIKEAHAE VSILPDDLPEL 

EEGFIVTITEVN 1451 

50 TE+ Y+ + T 1+ N+ E VS +P+ L L E+ 

+ T+T 

Sbjct: 61b 

TESTTYLSTSKTTTILOPTNVVAIVTEATGVSAIPEKLVTLHGTPAVSEKPDVATVTANV 675 

55 fluery: 1452 LVNSDFSTGCJPSVRRPGflEIAEIillEENDDPRGIFnFHVTRGAGEV- 
ITAYEXXXXXXXX 151D 

++ FS G PS+ +IEM + ++ GVIT 
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Sbjct: S7b SIHGTFSLG-PSI 

VYIEEEFIKNGTFNTAEVLIRRTGGFTGNVSITVKTFGERCAflM ^30 
fluery: 1511 

5 XXXXXXXAGSFGAVNVYIdKASPDSAGLEDFKPSHGILEFADKflVTAMIEITIIDDAEFEL 1S70 

G +G N+ U EDF+ L F D + + + 

I+DD E E 

Sbjct: ^31 EPNALPFRGIYGISNLTWAVEE— — 
EDFEEflTLTLIFLDGERERKVSVfllLDDDEPEG 1&S 

10 

fluery: 1571 TETFNISLISVAGGGRL — GDD VVVTVVIPflNDSPFGVFGFEEKTVS 

IblS 

EF+L+G6++GD V+I +D 6+ GF E++ S 

Sbjct: ^flfc, flEFF YVFLTNPflGGAfllVEGKDDTGFAAFAJIVIITGSDLHNGIIGFSEESflS 
15 1037 

Score = "13 (11-0 bits)-. Expect = l-De-16-i Sum P(3) = 1-Oe-lfl 
Identities = 51/E36 (E1X)-. Positives = 107/E3B (44>C) 

20 fluery: bOD VNIMILDDDIPEGDEKFflLILTNPSPGLELGKNT- 
IALIIVLANDDGPGVLSFNNSEHFF b5B 

++I + + +p+ +F+L +G++ AI+L + D + ++ 

F + 

Sbjct: 110*1 

25 ISIELKPEKVPflVEVYFFVELYEATAGAAINNSARFAfllKILESDESflSLVYFSVGSRLA llbfi 

fluery: bST LREPTALYVflESVAVLYIVREPAflGLFGTVTVflFIVTEVNSSNE — 

SKDLTPS KGYI 713 

+ A + L + R+ GL ++V F E+ S+ 

30 ++P+ K ++ 

Sbjct: llbT VAHKKATLIS LflVARDSGTGLM — 

MSVNFSTflELRSAETIGRTIISPAISGKDF V 1251 

fluery: 71M VLEEGVRFKALfllSAILDT — EPE MDEY FVCTLFNPTGGARLG- 

35 VHVflTLITVL 7b4 

+ E + F+ fl S +LD PE ++ + F LF+P GGAR+ V+ 

IT++ 

Sbjct: 1255 

ITEGTLVFEPGORSTVLDVILTPETGSLNSFPKRFfllVLFDPKGGARIDKVYGTANITLV 15fll 

40 

fluery: 7bS ANflAPLGLFSISAVENRATSIDI- 
EEANRTVYLNVSRTNGIDLAVSVflUETVSETAFGMR fiE3 

+ ++ ++ ++ + Dl T+ + V+ T D +S + 

+ 

45 Sbjct: 1SAS SDADSflAHilGLADQLHflPVNDDILNRVLHTISNKVA- 
TENTDEflLSAnMHLIEKIT — TE 133fi 

fluery: BS4 GMDVVFSV 631 
G FSV 
50 Sbjct: 133=1 GJCIflAFSV 134b 

Score = IB (13.fi bits)-. Expect = l-Se-ES-. Sum P(3) = LSe-SS 
Identities = 44/177 (E4X) -, Positives = BE/177 (4b*/.) 

55 fluery: bBD 

PAQGLFGTVTVflFIVTEVNSSNESKDLTPSKGYIVLEEGVRFKALfllSAILDTEPEnDEY 73 e 5 
P +G++G ++VE+ E+LT ++ +GR+++ +D 

EPE E+ 
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Sbjct: 13b PFRGIYGISNLTUAVEEEDF — EEfiTLT 

LIFLDGERERKVSVfllLDDDEPEGQEF <\6B 

Query: 740 FVCTLFNPT6GARL 

5 GVHVt2TLITVL<!N«3APLGLFSISAVENRATSIDIEEAN- ?<U 

F L NP GGA++ G ++ + + g+ S E + 

+++ E 

Sbjct: 161 FYVFLTNPfiGGACIVEGKDDTGFAAFAIIVIITGSDLHNGIIGFS-- 
EESfiSGLELREGAV 104b 

10 

fluery: 712 -RTVYLNVSRT-NGIDLAVSVaUE-TVSETAF 

GnRGriDVVFSVFflSFLDESASGIil 643 

R ++L V+R N V V hi T+++T G+ 11+ + SV + 

Sbjct: lew? 

15 tlRRLHLIVTRflPNRAFEDVKVFURVTLNKTVVVLQKDGVNLnEELQSVSGTTTCTIIGQTK 110b 

Query: 644 CFFTLE 641 
CF ++E 

Sbjct: 1107 CFISIE 1112 

20 

Score = 11 (13-7 bits)-. Expect = b-be-32-i Sum P(3) = b-be-32 
Identities = 41/153 (32*)-, Positives = 70/153 (45* ) 

Ouery: mbb 

RPGMEIAEInTEENDDPRGIFMFHVTRGAGEVTTAYEXXXXXXXXXXXXXXXAGSFGAVN 1S2S 
R G +AEI +P G+F F + + +1 + + 

+ + 

Sbjct: bS2 RSGFIVAEI EPMGVFOFSTS— 

SRNIIVSEDTaMIRLHVflRLFGFHS]) LIK 700 

Query: 152b VYWKASPDSAG-LEDFKP- 
SHGILEFADKQVTAMIEITIIDDAEFELTETFNISLISVAG 1563 

V ++ + SA LEDF+P +G L F Q EITII+D E+ E F 

I+L SV 

Sbjct: 701 

VSYflTTAGSAKPLEDFEPViaNGELFFfiKFCJTEVDFEITIINBQLSEIEEFFYINLTSVEl 7b0 

Query: 1564 GG — RLGDP VVVT VV-IP(3NDSPFGV-FGFEEKTVS IblS 

G RL D V V+ I N» G+ F E TV + 

40 Sbjct: 7bl RGLQKFDVNWSPRLNLDFSVAVITILDNDDLAGHPISFPETTVA 604 

Score = b5 (1-6 bits)-, Expect = 6.6e-21-, Sura P(3) = 6-66-21 
Identities = 2b/11 (2b*)-, Positives = SO/IT (SO*) 

45 fluery: 1232 NSLYKfiVEEMEflDSLVTLNVERLKGTYGRITIAIilEADGS ISDIF — 

PTSGVILFTE 1265 

NS K+ + + D ++++ GT IT+ +AD ++D P 

+ IL 

Sbjct: 1250 NSFPKRFOIVLFDPKGGARIDKVYGT- 
50 ANITLVSDADSfiAIUGLADflLHflPVNDDIL 1305 

£3uery: 126b GtJVLSTITLTILADNIPELSEVVIVTLTRITTEGVEDSYKGAT 1326 

+VL TI++ + +N E ++ + +ITTEG ++ A+ 
Sbjct: 130b NRVLHTISHKVATENTPE(2LSAnriHLIEKITTEGKI(3AFSVAS 1346 

55 

Score = 46 (7-2 bits)-, Expect = l-le-27-, Sum P(3) = l-le-27 
Identities = 23/115 (20*)-. Positives = 44/115 (36*) 



25 



30 



35 
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tfuery: 14^ TAYEXXXXXXXXXXXXXXXAGSFGAVNVYWKAS 

PDSAGLEDFKPSHGILEFAD 1551 

TA++ G++GA++V Id P+ + + Ph 

G L F + 
5 Sbjct: 554 

TAFfiLriNITAGTSHV/HISRRGTYGALSVAUTTGY APGLEIPEFIVVGNflTPTLGSLSFSH bl3 

<2uery: 1552 K<3 VTAMIEITIIDDAEFELTETFNISLI-- 
SVAGGGRLGDDVVVTVVIPflNDSPFGVFGF lbO 1 } 
10 + + + + ++S + S GG +L +V + 

P GVF F 

Sbjct : fall GE(2RKGVFLUTFPSPGWPEAFVLHLSGVtfSSAPGGA<2LRSGFIVAEI 

EPNGVFflF bbE 



15 



20 



55 



Pedant information for DKFZphamy2_10p7 frame 3 
Report for DKFZphamy2__iap7 . 3 



ELENGTHJ lb!5 

EMbO 177bDD.-5a 

EpI3 4-37 

25 {EH0N0L3 TREHBL : AFD550fiM_l gene: "VLGR1"; product: "very 

large G-protein coupled receptor-l" V Homo sapiens very large G- 
protein coupled receptor-1 (VLGR1) mRNA-i complete cds- 5e-24 
EBLOCKSl BP014T3A 

EBL0CKS3 BLDD713B Sodium : dicarboxylate symporter family proteins 

30 

EBLOCKS J PRD10D3A 

(CBL0CKS3 PR00412C 

EBL0CKS3 BL0DS2ME 

EPIRKIO heart le-Dfl 

35 EPIRKtO ion transport le-Ofi 

EPIRKhO transmembrane protein 3e-D6 

EPIRKliO phosphoprotein 2e-0fi 

EPIRKliO membrane protein le-oa 

EPROSITEJ HULTIC0PPER_0XIPASE1 . 1 
40 JLKW1 Alljeta 

EKIO L0W_C0MPLEXITY 2-bO 

SE(2 1> A WAI>AIJ A LYTC A TLCLKE(3 ACS A FSFFS ASEGP(3CFUriTSLIISP A VNNSDFUT YRKNMT 

45 SEG . . . xxxxxxxxxxx 

PRD ccchhhhhhhhchhhhhhhhhhheeeeeecccccceeeeeeeccccccccceeeecccee 

SE(2 RVASLFSGt33AVAGSDYEPVTR(2UAIM(3EGI>EFANLTVSILPI>I>FPEni>ESFLISLLEVHL 

SEG - 

50 PRD eeeeeccccccccccceeeceeeeeeccccceeeeeeeeccccccchhhhhhhhhhhhhh 

SE<2 nNISASLKN^PTIGQPNISTVVIALNGDAFGVFVIYSISPNTSEl>GLFVEV(3E(2P(3TLVE 

SEG 

PRD hccccccccccccccccceeeeeeecccceeeeeeeeecccccccceeeeeeecccceee 



SE<2 LriIHRTGGSLG(3VAVEbJRVVGGTATEGLDFIGAGEILTFAEGETKKTVILTILDDSEPED 

SEG • xxxxxxxxx 

PRD eeeeecccccceeeeeeecccccccccccccccceeeeeccccceeeeeeeeeccccccc 
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SEfl DESIIVSLVYTEGGSRILPSSDTVRVNILANDNVAGIVSFflTASRSVIGHEGEILflFHVI 

SEG xxx xxx x 

PRD ccceeeeeeeccccccccccccceeeeeeccccceeeeeeeccceeeeccccceeeeeee 

5 

SEfl RTFPGRGNVTVNWKIIGflNLELNFANFSGflLFFPEGSLNTTLFVHLLDDNIPEEKEVYflV 

SEG 

PRD eccccccceeeeeeeecccccccccccccceeecccceeeeeeeeeecccccccccceee 

10 SEfl ILYDVRTflGVPPAGIALLDAflGYAAVLTVEASDEPHGVLNFALSSRF VLLflEANITIflLF 

SEG 

PRD eeccceeeeccchhhhhhhhccccceeeeeecccccceeeeeeceeeeeecccccceeee 

SEfl INREFGSLGAINVTYTTVPGMLSLKNflTVGNLAEPEVDFVPIIGFLILEEGETAAAINIT 

15 SEG - . 

PRD cccccccceeeeeeecccccccccccccccccccccceeeeeeeeeeeccccccccceee 

SEfl ILEDDVPELEEYFLVNLTYVGLTflAASTSFPPRLDSEGLTAflVIIDANDGARGVIEIiJflflS 

SEG 

20 PRD eccccchhhhhheeeeeeeecceeecccccccccccccceeeeeeeccccceeeeeeccc 

SE(3 RFEVNETHGSLTLVAflRSREPLGHVSLFVYAflNLEAflVGLDYIFTPNILHFADGERYKNV 

SEG 

PRD eeeecccccceeeeeeccccccceeeeeeeeccccccccccccccceeeecccccceeee 

25 

SEfl NIMILDDDIPEGDEKFflLILTNPSPGLELGKNTIALIIVLANDDGPGVLSFNNSEHFFLR 

SEG 

PRD eeeeeccccccccceeeeeeeccccccccccceeeeeeeecccccceeeeeeccceeeee 

30 SEfl EPTALYVflESVAVLYIVREPAflGLFGTVTVflFIVTEVNSSNESKDLTPSKGYIVLEEGVR 

SEG 

PRD ccceeeeccchhhhhhhhhcccccceeeeeeeeeeeccccccccccccccceeeeeccce 

SEfl FKALfllSAILDTEPEMDEYFVCTLFNPTGGARLGVHVflTLITVLflNflAPLGLFSISAVEN 

35 SEG 

PRD eeeeeeeeecccchhhhhhheeeeecccccceeehhhhhhhhhhhhhcccceeeeeecch 

SEfl RATSIDIEEANRTVYLNVSRTNGIDLAVSVflUETVSETAFGMRGMDVVFSVFflSFLDESA 

SEG 

40 PRD hhhhhccccccceeeeeeeccccchhhhheeeeeccceeeeccccceeeeeeeecccccc 

SEfl SGWCFFTLENLIYGIIILRKSSVTVYRUflGIFIPVEDLNIENPKTCEAFNIGFSPYFVITH 

SEG 

PRD cceeeeeccccccceeecccceeeecccceeeccceeeecccccccQeecccccceeeee 

45 

SEfl EERNEEKPSLNSVFTFTSGFKLFLVflTITILESSflVRYFTSDSflDYLIIASflRDDSELTfl 

SEG 

PRD hhhhhcccceeeeeeecccceeeeeceeGCCcccceeeeccccceeeeeeecccccceee 

50 SEfl VFRUNGGSFVLHflKLPVRGVLTVALFNKGGSVFLAISflANARLNSLLFRUSGSGFINFflE 

SEG 

PRD eeeeccceeeeeeccccceeeeeeeeccccceeeeeeehhhhhheeeeeecccccceeee 

SEfl VPVSGTTEVEALSSANDIYLIFAKNVFLGDflNSIDIFIUEMGflSSFRYFflSVDF AAVNRI 

55 SEG 

PRD eeccccceeeeccccceeeeeeeeeeeecccceeeeeeeeccccceeeeeeccceeeece 

SEfl HSFTPASGIAHILLIGflDMSALYCldNSERNflFSFVLEVPSAYDVASVTVKSLNSSKNLIA 
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SEG • • • 

PRD eecccccceeeeeeeccccceeeeecccccceeeeeeeccccceeeeeeeccccccceee 

SE<3 L VGAHSHIYELAYISSHSDFIPSSGELIFEPGEREATIAVNILDDTVPEKEESFKVflLKN 

5 SEG - 

PRD eeccceeeeeeeeeeccccccccceeeeecccchhhhheeeeeccccccccceeeeeeec 

SEfl PKGGAEIGINDSVTITILSNDDAYGIVAFAONSLYKflVEEIIEtJDSLVTLNVERLKGTYGR 

SEG 

10 PRD ccccceeecccceeeeeecccccchhhhhhccchhhhhhhhhhhhhhhhhhhccccceee 

SE<3 ITIAUEADGSISDIFPTSGVILFTEGtJVLSTITLTILADNIPELSEVVIVTLTRITTEGV 

SEG 

PRD eeeeeeeccceeeeeccccceeeeccccccceeeeeecccceeeeeeeeeeeeeeceeee 

15 

SE(3 EDSYKGATIDQDRSKSVITTLPNDSPFGLVGURAASVFIRVAEPKENTTTLflL(2IARDKG 

SEG 

PRD cceeeeeeeecccceeeeeecccccccceeehhhhhhGeeeeccccccccceeeeccccc 

20 SE(2 LLGDIAIHLRA(3PNFLLHVDN<3ATENEDYVL<2ETIIIMKENIKEAHAEVSILPDDLPELE 

SEG - 

PRD ccccccceeecccceeeeeccccccccceeeeeeeeeecccchhhhheeeeccccccccc 

SE<2 EGFIVTITEVNLVNSDFSTGflPSVRRPGMEIAEIMIEENDDPRGIFMFHVTRG AGE VITA 

25 SEG -•• • 

PRD cceeeeeeeeeeccccccccccccccccchhhhhhhhcccccceeeeeeeeccccceeee 

SEC YEVPPPLNVLGVPVVRLAGSFGAVNVYUKASPDSAGLEDFKPSHGILEFADKGVTANIEI 

SEG • • xxxxxxxxxxxxxxx •• 

30 PRD eeccccceeeeeeeeeecccccceeeeeeccccccccccccccceeeeecccceeeecce 

SEC TIIDDAEFELTETFNISLISVAGGGRLGDDVVVTVVIPdNDSPFGVFGFEEKTVS 

SEG 

PRD eeechhhhhhhhcceeeeeeecccccccceeeeeeeecccccccceeeecccccc 

35 

Prosite for DKFZphamy2_10p?-3 

40 PSOOD?^ 1S1->17H HULTI<I0PPER_0XIDASE1 PD0CDD07b 

(No Pfam data available for DKFZphamyE_lDp7 - 3 ) 
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5 group: transmembrane protein 

DKFZphamy2_lld2 encodes a novel 555 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 2 transmembrane regions. 

No informative Blast results^ no predictive prosite-i pfam or 
scope motife- 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells- 



20 



unknown protein 
Pedant: TRANSMEMBRANE 2 
Sequenced by EMBL 
25 Locus: /map="lLpl3 . 3" 
Insert length: 2^3^ bp 

Poly A stretch at pos. E^EDi pol yadenylation signal at pos* 2fib^ 

30 

1 GGCGGGTGAG AGGCCGCGGC GGCAGGTCCA CCTGGGCTTG CGAAGGCACA 

51 GATTCCCCGT CCACAGCTCA CGACCAGATG CACCAGCAGG AGTCCACATC 

101 GAGGACGTCC TCCGGGCACT CCCACGACCA GTGACCAGGA GTTAAACTTT 

151 GGGATGTGCC CGTGATGTTG GACCACAAGG ACTTAGAGGC CGAAATCCAC 

35 201 CCCTTGAAAA ATGAAGAAAG AAAATCGCAG GAAAATCTGG GAAATCCATC 

251 AAAAAATGAG GATAACGTGA AAAGCGCGCC TCCACAGTCC CGGCTCTCCC 

301 GGTGCCGAGC GGCGGCGTTT TTTCTTTCAT TGTTTCTCTG CCTTTTTGTG 

351 GTGTTCGTCG TCTCATTCGT CATCCCGTGT CCAGACCGGC CGGCGTCACA 

H01 GCGAATGTGG AGGATAGACT ACAGTGCCGC TGTTATCTAT GACTTTCTGG 

40 M51 CTGTGGATGA TATAAACGGG GACAGGATCC AAGATGTTCT TTTTCTTTAT 

501 AAAAACACCA ACAGCAGCAA CAATTTCAGC CGATCCTGTG TGGACGAAGG 

551 CTTTTCCTCT CCCTGCACCT TTGCAGCTGC TGTGTCGGGG GCCAACGGCA 

b01 GCACGCTCTG GGAGAGACCT GTGGCCCAAG ACGTGGCCCT CGTGGAGTGT 

t51 GCTGTGCCCC AGCCAAGAGG CAGTGAGGCA CCTTCTGCCT GCATCCTGGT 

45 701 GGGCAGACCC AGTTCTTTCA TTGCAGTCAA CTTGTTCACA GGGGAAACCC 

751 TGTGGAACCA CAGCAGCAGC TTCAGCGGGA ATGCGTCCAT CCTGAGCCCT 

fiOl CTGCTGCAGG TGCCTGATGT GGACGGCGAT GGGGCCCCAG ACCTGCTGGT 

fi51 TCTCACCCAG GAGCGGGAGG AGGTT AGTGG CCACCTCTAC TCCGGCAGCA 

=J01 CCGGGCACCA GATTGGCCTC AGAGGCAGCC TTGGTGTGGA CGGGGAAAGT 

50 ^51 GGCTTCCTCC TTCACGTCAC CAGGACAGGT GCCCACTACA TCCTCTTTCC 

1001 CTGCGCAAGC TCCCTCTGCG GCTGCTCTGT GAAGGGTCTC TACGAGAAGG 

1051 TGACCGGGAG CGGCGGCCCG TTCA AGAGTG ACCCGCACTG GGAGAGCATG 

1101 CTCAATGCCA CCACCCGCAG GATGCTTTCC CACAGCTCTG GAGCAGTGCG 

1151 CTACCTGATG CATGTCCCAG GGAACGCCGG TGCAGATGTG CTTCTTGTGG 

55 12D1 GCTCAGAGGC CTTCGTGCTG CTGGACGGGC AGGAGCTGAC GCCTCGCTGG 

1251 ACACCCAAGG CAGCCCATGT CCTGAGAAAA CCCATCTTCG GCCGCTACAA 

1301 ACCAGACACC TTGGCTGTAG CCGTTGAA AA CGGAACTGGC ACCGACAGAC 

1351 AGATCCTGTT TCTG6ACCTT GGCACTGGAG CCGTCCTGTG TAGCCT AGCC 
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moi CTCCCGAGCC TCCCTGGGGG TCCACTGTCC GCCAGCCTGC CGACCGCAGA 

mSl CCACCGCTCA GCCTTCTTCT TCTGGGGCCT CCACGAGCTG GGGAGCACCA 

1501 GCGAGACGGA GACCGGGGAG GCCCGGCACA GCCTGTACAT GTTCCACCCC 

1551 ACCCTGCCGC GCGTGCTGCT GGAGCTGGCC AATGTCTCTA CCCACATTGT 

5 lbOl CGCCTTTGAC GCCGTCCTGT TTGAGCCAAG CCGCCACGCC GCCTACATCC 

lb51 TTCTGACAGG CCCGGCAGAC TCAGAGGCAC CCGGCCTGGT CTCTGTGATC 

1701 AAGCACAAGG TGCGGGACCT TGTCCCAAGC AGCAGGGTGG TCCGCCTGGG 

1751 TGAGGGTGGG CCAGACAGTG ACCAAGCCAT CAGGGACCGG TTCTCCCGGC 

1801 TGCGGTACCA GAGTGAGGCG TAGAGGCACG CCAGCCAGAG CCTGTGGAGA 

10 1651 GACTCCGCCT GCTGACACTA AACGTCCTGG GAAGTGGGCC CTTCCCTGGG 

nDl TCTCTGCACT GACTCCCCCA CTCCTGACCC TGGTGATGGT CGCCACTGGG 

nSl CAGCAGCAGC CTTACCAGTC CTCCATGATC ACACCCAGGG ACCTGCATGG 

2001 GTGAGGGGAC ACCCTGGGCC TCTCTCCCGC CCAGCATCCT CCCTGAGTCC 

2051 CCACACAGGG CCTCACTCTG CACCCCACCA GGGTCCCGCT CACACCAGGC 

15 E1D1 AGCCTTCATA GTGGTCTCCC TGGCCACCTT GGGCAGAGCT GGGTCATGCA 

2151 GCACCCCATC CTT ACCCGGT GCCCTCTCCT TGCCAGCTTC TCCCCAGGCC 

2201 AGAGCGGCCA TCGCGTAGAA AGAACCAGGG TGTCCCCGGG ACAGGCCGTC 

2251 CCCCACCCCA TCCTGT AGAG TCCATTCCCC TTTTCCCTCC TGTGCTCTGT 

2301 CCCCCAAGGA GTCATGGAAC TCAGGGTACT GGGCCTCAAC GGGAACCTGA 

20 2351 GACAGCTTCC AGCTTCGCAG CCCTTCCCGG AGCTACAGGG GGATCCTCTA 

2401 GCATGGGGGG TGTGACTTGG T.TCCTTTGAC CAGGTCCTGT GAGGAAGCCT 

2451 GGAGCAAGGG TCTCCCCCAG CAGGATGGGT GGGGCCTGCT CTGGAGCTGA 

2501 GCCCGTGGCC GCTCACAGGT GTCGTT AGTG GTGTTGCAGC TGTCTACTGG 

2551 CTGCATGTGC TGTGAATATC CCAAGGAACT GGCTGTGGAA TGCGTGTTTG 

25 2b01 GGTCAGTCTG TGCCCTCTCA GTAGACACTG GAGCTGCTCT GTCCCTGAAG 

2bSl AGGCCCCGTG CCCCAGGCAT GGCAAGCGCC TGCCTCTCCC CTTCCGGTGC 

27D1 TCACACGCCC ACGCCGTGCC ACCCGATGCA GGACTCACCT CTGTGCCTTG 

2751 CTGCTCCTGA GGCCCAAGGG CAGCCATGGT GCTCTGTACT GCTCGGGCCG 

2SD1 CCCAGGTCAC AGAGCCTGAG CTTCGTAGCC AAAGCAGCCT GATGACCCAC 

30 2651 CCACCAAGGA AGAAAGCAGA ATAAACATTT TTGCACTGCC TGAAAAACCC 

2^01 CGGTGGTCAG GCGTGAGCCT AAAAAAAAAA AAAAAAAAA 
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BLAST Results 
No BLAST result 
40 Medline entries 

No Medline entry 



45 

Peptide information for frame 2 



50 ORF from 2555 bp to 2ft3T bp=. peptide length: T5 
Category: questionable ORF 
Classification: unclassified 

1 NCCEYPKELA VECVFGSVCA LSVDTGAALS LKRPRAPGMA SACLSPSGAH 
55 51 TPTPCHPMflD SPLCLAAPEA (3GC3PUCSVLL GPPRSGSLSF VAKAA 
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30 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_lld2 i frame 2 

TREMBL :MMIGCF_2 Mouse ig gamma2a-b (c57bl/b allele) c gene and 
secreted 

tail.-i N = li Score = 73-. P = 0-1 



>TREMBL : MMIGCF_2 Mouse ig gamma2a-b ( c57bl/b allele) c gene and 
secreted 

tail. 

15 Length = 33H 

HSPs: 

Score = 73 (11. □ bits)-. Expect = l-le-Ol-. P = l-Oe-01 
20 Identities = Ib/H") (32'/.)-. Positives = 27/^ ( SB'/. ) 

fluery: *m LSPSGAHTPTPCHPM(3DSPLCLAAPEAl2G<3Pli!CSVLLGPPRS<2SLSFVA 12 

•»■ P T PC P+++ P C AAP+ G P SV + PP+ + + ++ 
Sbjct: lb IEPRVPITflNPCPPLKECPPC-AAPDLLGGP-- SVFIFPPKIKDVLMIS 
25 mi 



Peptide information for frame 3 

ORF from lb5 bp to 1S20 bp; peptide length: SS2 
Category: putative protein 

Classification: Transmembrane proteins unclassified 



1 MLDHKDLEAE IHPLKNEERK SGENLGNPSK NEDNVKSAPP (2SRLSRCRAA 

51 AFFLSLFLCL FVVFVVSFVI PCPDRPAS(2R MWRIDYSAA V IYDFLAVDDI 

101 NGDRIODVLF LYKNTNSSNN FSRSCVDEGF SSPCTFAAAV SGANGSTLUE 

151 RPVAfiDVALV ECAVPflPRGS EAPSACILV6 RPSSFIAVNL FTGETLItlNHS 

40 201 SSFSGNASIL SPLLQVPDVD GDGAPDLLVL TflEREEVSGH LYSGSTGHGI 

251 GLRGSLGVDG ESGFLLHVTR TGAHYILFPC ASSLC6CSVK GLYEKVTGSG 

301 GPFKSDPHWE SMLNATTRRM LSHSSGAVRY LMHVPGNAGA DVLLVGSEAF 

351 VLLDGflELTP RUTPKAAHVL RKPIFGRYKP DTLAVAVENG TGTDRlSILFL 

M01 DLGTGAVLCS LALPSLPGGP LSASLPTADH RSAFFFUGLH ELGSTSETET 

45 H51 GEARHSLYMF HPTLPRVLLE LANVSTHIVA FDAVLFEPSR HAAYILLTGP 

501 ADSEAPGLVS VIKHKVRDLV PSSRVVRLGE GGPDSDCAIR DRFSRLRYflS 
551 EA 



50 

BLASTP hits 

No BLASTP hits available 
55 Alert BLASTP hits for DKFZphamy2_lld2-. frame 3 

No Alert BLASTP hits found 
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Pedant information for DKFZphamyE_JLldE n frame S 



Report for DKFZphamyE_lld2 . E 

5 

lELENGTHD 

0111113 1757.38 
CpIJ b-bfl 
10 IEBL0CKS3 PR005E1E 
[KU3 Alpha_Beta 

SE(2 flCCEYPKELAVECVFrSSVCALSVDTGAALSLKRPRAPGHASACLSPSGAHTPTPCHPnflP 
15 PRD cccchhhhhhhhhccceeeeeecccchhhhhccccccccccccccccccccccccccccc 

SE(2 SPLCLAAPEAdGflPWCSVLLGPPRSflSLSFVAKAA 
PRD ccccccccccccccceeeeccccccchhhhhhccc 

20 

(No Prosite data available for DKFZphamyE_lldE . E) 
(No Pfam data available for DKFZphamyE_lldE • E) 

25 

Pedant information for DKFZphamyE_lldE n frame 3 
Report for DKFZphamyE_lldS - 3 

30 

lELENGThO 55E 
EriliO S^bS^.Lfi 
CpIJ 5-flM 
35 EBL0CKS3) PRD0211G 

DEBLOCKS! BLODEfifiC Tissue inhibitors of metal loproteinases 
proteins 

EBL0CKS3 PRQ043bA 
[EKLO TRANSMEMBRANE E 

40 (EKIO L0U_C0MPLEXITY 6-15 X 

SE<2 HLDHKDLEAEIHPLKNEERKS(2ENLGNPSKNEDNVKSAPP(3SRLSRCRAAAFFLSLFLCL 

SEG xxxxxxxxx 

45 PRD ccchhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhh 

MEM - • MMMMMMM 

SE(2 FVVFVVSF VIPCPDRPAS(2RMIJRID YSAAVIYDFLA VDDINGDRIGDVLFLYKNTNSSNN 

SEG xxxxxxxxx • - . • - • • • 

50 PRD hhhhhhhccccccccccchhhhhhhchhhhhhhhhccccccccchhhhhhhccccccccc 

MEM MMMMMMMMMM - . . 

SE(2 FSRSCVDEGFSSPCTFAAAVSGANGSTLWERPVAt3DVALVECAVP(3PRGSEAPSACILVG 

SEG • 

55 PRD ccccccccccccccccccccccccccccccccccchhhhhhhccccccccccceeeeeec 

MEM • • • . 

SEt3 RPSSFIAVNLFTGETLWNHSSSFSGNASILSPLL<2VPDVDGDGAPDLLVLT£EREEVSGH 
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SEG • • 

PRD cccceeeeeccccccccccccccccceeeecceeecccccccccccchhhhhhhhhhhcc 

MEM 

5 SEC LYSGSTGHlHGLRGSLGVDGESGFLLHVTRTGAHYILFPCASSLCGCSVKGLYEKVTGSG 

SEG • 

PRD cccccccccccccccccccccceeeeeeecccceeeeeccccccccccceeeeccccccc 

mem • 

10 sea gpfksdphuesmlnattrrmlshssgavrylmhvpgnagadvllvgseafvlldgfleltp 

SEG 

PRD ccccccccccccchhhhhhhhhcccccceeeccccccccceeeeeccceeeeeccccccc 

MEM 

15 SE<2 RWTPKAAHVLRKPIFGRYKPDTLAVAVENGTGTDRfllLFLDLGTGAVLCSLALPSLPGGP 

SEG • xxxxxxxxxxx 

PRD ccchhhhhhcccccccccccceeeeeeccccccceeeeeeeccccceeeeeeeccccccc 

mem nnnnrinritinnniinnririri 

20 SE(2 LSASLPTADHRSAFFFUGLHELGSTSETETGEARHSLYI1FHPTLPRVLLELANVSTHIVA 

SEG xxxxxx xxxxxxxxxx- 

PRD ccccccccccccceeeccccccccccccccccccccceeeccccccccccccccceeeee 

MEM .- 

25 SEtf FDAVLFEPSRHAA YILLTGPADSEAPGLVSVIKHKVRDLVPSSRVVRLGEGGPDSDQAIR 

SEG 

PRD eeeeeeccccceeeeeecccccccccceeeeeecccccccccceeeeecccccccchhhh 

MEM • 

30 SE«2 DRFSRLRYflSEA 

SEG 

PRD hhhhhhhhhccc 

MEM 

35 

(No Prosite data available for DKFZphamy2_lld2 -3) 

(No Pfam data available for DKFZphamy2_lldB-3) 
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5 group: nucleic acid management 



10 



15 



20 



25 



DKFZphamyS^llnM encodes a novel 10^1 amino acid protein with 
similarity to RADlfl of Schizosaccharomyces pombe and YLR3&3w of 
Saccharomyces cerevisia. 

The novel protein contains a ATP/GTP-binding site motif A CP- 
loop)- It has similarity to RADlfl acts in a DNA repair pathway 
for removal of UV-induced DNA damage- YLR3fl3w of Saccharomyces 
cerevisiae is a recombination repair protein- 

The new protein can find application in modulation of DNA-repair 
and a as a new tool for manipulation of nucleic acids- 
similarity to RADlfl (Schizosaccharomyces pombe) 
comment on PSSbTS: 

FUNCTION: ACTS IN A DNA REPAIR PATHWAY FOR REMOVAL OF UV-INDUCED 
DNA DAMAGE THAT IS DISTINCT FROM CLASSICAL NUCLEOTIDE EXCISION 
REPAIR AND IN REPAIR OF IONIZING RADIATION DAMAGE - 

Sequenced by EMBL 

Locus : /map="2" 

30 Insert length: 3^7*! bp 

Poly A stretch at pos- 3bMfcn polyadeny lation signal at pos - 3bED 

1 ACCGCGGTGG GCGCCGGGGC TCCCGGGAAT CTACCTTCTC CTGCGGCCGG 

35 51 CACGCGGTTC CCAGGGGGCC AGCGGCGGTC AGCCGAGGTC GAGACGCCCG 

101 CAGGGTGGCC TTAGCGGCCG GTCGTACCAC GGCAGCCCCG CCGATCAGGT 

151 TCCTTTGGGA GACTTCGACT TGTTGGCGAA ATGAACCGGA GAAGAATCCC 

E01 AATTGGGAAT TGCGGAAAAC AGGACTCTAG GGTAGAGAAA GGTTGTAGAA 

251 CCAATAGGGT TTGAGACCTG ATGGCCAAAA GAAAGGAAGA AAATTTTTCC 

40 301 TCTCCTA AAA ATGCCAAAAG GCCAAGACAA GAAGAATTGG AGGATTTTGA 

351 TAAAGATGGT GACGAAGACG AATGT AAAGG TACTACTTTG ACTGCAGCAG 

MD1 AAGTTGGAAT AATTGAGAGT ATTCACCTAA AAAACTTCAT GTGTCATTCA 

M51 ATGCTTGGAC CTTTTAAGTT TGGTTCTAAT GTCAACTTTG TTGTTGGCAA 

5D1 CAATGGAAGT GGGAAGAGTG CAGTACTCAC AGCTCTCATA GTCGGTCTTG 

45 551 6TGGAAGAGC AGTTGCTACT AATAGAGGAT CCTCTTTAAA AGGTTTTGTG 

bOl AAAGATGGAC AGAACTCTGC AGATATCTCA ATAACATTGA GGAACAGAGG 

b51 AGATGATGCC TTTAAAGCCA GTGTGT ATGG TAACTCTATA CTT ATACAGC 

701 AACACATCAG CATAGATGGA AGTCGATCTT ATAAACTTAA AAGTGCAACA 

751 GGCTCCGTGG TTTCCACGAG GAAAGAAGAG CTGATTGCAA TTCTTGATCA 

50 flDl TTTTAACATC CAGGTGGATA ATCCAGTTTC TGTTTTAACA CAAGAAATGA 

B51 GCAAGCAGTT CTTACAGTCT AAAAATGAAG GAGACAAAT A CAAATTCTTC 

^01 ATGAAAGCAA CGCAACTTGA ACAGATGAAG GAAGATTATT CATACATTAT 

^51 GGAAACGAAA GAAAGAACAA AGGAGCAGAT ACATCAAGGA GAAGAGCGGC 

1001 TTACTGAACT AAAGCGCCAG TGTGTAGAGA AAGAGGAACG TTTTCAAAGT 

55 1D51 ATTGCTGGTT TAAGTACAAT GAAGACT AAT TTAGAGTCCT TGAAACATGA 

1101 AATGGCTTGG GCAGTGGTCA ATGAAATTGA AAAACA ATTG AATGCCATCA 

1151 GAGATAATAT CAAAATTGGA GAAGATCGTG CTGCTAGACT TGACAGGAAA 

1501 ATGGAAGAAC AGCAGGTCAG ACTTAATGAG GCAGAACA A A AGTACAAGGA 
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1251 
1301 
1351 
1M01 
1M51 
15D1 
1551 
IbOl 
IfeSl 
1701 
1751 
1601 
1651 
1101 
1151 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2H01 
2M51 
2501 
2551 
BtOl 
2b51 
2701 
2751 
2601 
2651 
2101 
2151 
3001 
3051 
3101 
3151 
3E01 
3251 
3301 
3351 
3M01 

31451 

3501 
3551 
3fc,01 
3b51 



TATTCAAGAC 
CAGAATGTAT 
AATGAAGCTG 
AAAGAAAGAT 
GTACTGACCA 
TCTTGGTTAA 
CAATCAAGAG 
AACATGGCAA 
TACAATCAGA 
CAAAAGATTT 
CTTATAGACA 
TGCATTCATC 
AAAAGGGCTT 
TCCTTCAGGC 
CCGATAATAG 
AGCTGCTTAT 
ATAATGCGGT 
GTGCTACTAA 
AAAGCCACCC 
TTTTTGCAGG 
AGCAGAGATG 
TAAGACGGCC 
AAGATATTAA 
AAAGAACTAA 
TGAGAACATA 
AAGCTCAGGA 
CAACAAAAAG 
AAATAAGTAT 
CAGACCCACT 
CAAAAACGAG 
TACCTTAAAT 
AGGAGAAAAT 
GAAAAATCTG 
GATACAGGCA 
AGTACCAAGA 
ACTTTAAAAA 
CAAGACATAT 
ACTTTGACAA 
GACCACAAGA 
TAAAGCTGCT 
TCTCCACAGT 
TTCAGATGCC 
AATTGCCATG 
AGTTTATCTT 
CTGAT AAGAA 
GCCTTTCAGA 
TAACTTAACA 
AATTCTGGAC 
AAAAAAAAAA 



A AACTAGAAA 
GGCATTGAA A 
AGGTTTTATA 
GATGAGCAGC 
ATCTTTGGAA 
AAGAGAGAGT 
ATCGAACAGT 
AATTAAGAGA 
GGCAACTGAA 
GGCCCTAATG 
AGGACATTTT 
TTCGGGACCC 
CTGCAGGCCT 
ACTCATGAAA 
TTTCTGAGTT 
CATCCAGACT 
TGTGGCAAAT 
TCAAAAATAA 
AAAAATTGTA 
ACGTTATTAT 
TGGATTCTGA 
CAGATATTAA 
ACACAATGAG 
AGATGAAAAT 
GAAGA ACACC 
AAATAAAAGC 
AAAATATGGA 
GATGCAATTA 
TAAGGATGAA 
GGAAACGACA 
AAAAAGAAAC 
GTCACAAGCA 
CATCAATTCT 
GAACATGCTA 
AGCAAGAGAG 
AGTTTATTAA 
CAACAATTTA 
CTTACTATCT 
ATGAAACTCT 
TTCAATGACA 
GTGTTTTATT 
TGGATGAATT 
GACTTGATAC 
GCTCACACCT 
TTCTCCGAAT 
CCTGTGACTC 
TGCCTTGTCC 
TCTTTGATAT 
AAAAAAAAAA 



AGATTAGTGA 
GCAGATGTTG 
TAACCGATCC 
TTTGTA AACG 
CCTGAACGGT 
AAAGGCCTTT 
TTCAGCA AGC 
GAAGAATT AG 
AGAATTGAAA 
TTCCAGCTCT 
ACCTATAAAC 
AGAACTTGCT 
ATTGTTGCCA 
AGGTTTT ATT 
TCGGAATGAG 
TTCCAACAGT 
AGCCTAATTG 
TTCTGTAGCT 
GAGAAGCTTT 
TCATCTGAAA 
AATAAGTGAC 
ATCTTCAGCA 
GAACTTCTTA 
AAGAAAAAAT 
AGTCTGTAGA 
AAAATGA A AA 
GCATCTTAAA 
AATTCAA AAT 
TTAAACCTTG 
TTATGAAGAA 
GAGAACTGGA 
AGACAAATCT 
GGACAAAGA A 
GTCATGGAGA 
ACCTATCTTG 
ATTACTGGGA 
GAAGGTGTTT 
CAGCGGGCCT 
AAGTATATCA 
TGAGAGCCTT 
CTTTCCCTGT 
TGATGTCTAC 
TGAAGATGGC 
CAAAGCATGA 
GTCTGATCCT 
AAGAAGAAGA 
TGATGTTGAA 
AAT AAAATGA 
AAAAAAAAA 



AGAGACAAAT 
TTGCTAAGA A 
TTAAACGAAT 
AATTGAAGAG 
TGGAAAGACA 
CAAAATCAAG 
CATAGAAAAG 
ATGTGAAGCA 
GATAGTAAA A 
TCTTGA AGCC 
CTGT AGGCCC 
TTGGCTATTG 
TAATCATGCT 
TACCAGGGAC 
ATATATGATG 
TCTGACAGCT 
ACATGAGAGG 
CGTGCAGTAA 
TACTGCTGAT 
ATACAAGACC 
TTGGAGAATG 
ACATTTATCT 
AAAGGTGCCA 
ATTTCTGAAA 
TATTGCAACT 
TGGTTGAGGA 
AGTCTGAAAA 
TAATCAACTA 
CTGATTCTGA 
AAACAAAAAG 
TATGAAAGAG 
GCCC AGAGCG 
ATTAATCGAT 
TCGAGAGGAA 
ATCTGGATAG 
GAAATCATGG 
GACTTTACGA 
ATTGTGGAA A 
GTTC AGCCTG 
GTCTGGAGGT 
GGTCCATCGC 
ATGG ATATGG 
AGATTCCCAG 
GTTCACTTCC 
GAAAGAGGAC 
TGATGACCAA 
GGATTTGTGA 
GACTGGAGGC 
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GCACGAGCAC 
AAGGGCCT AT 
ATAA AGCATT 
CTGAAAAAA A 
AAAAAAAAT A 
AAAATTCAGT 
GACAAAGAAG 
TGCACTGAGC 
CTGATCGACT 
ATAGATGATG 
TTT AGGAGCT 
AATCTTGCTT 
GATGAAAGGG 
CTCACGGCCA 
TAAGACACAG 
TTAGAAATAG 
CATAGAGACA 
TGCAGTCCC A 
GGTGATCAAG 
TAAGTTCCTA 
AGGTTGAAA A 
GCCCTTGAA A 
ACTACATTAT 
TTCGGGAACT 
TTGGAAGATG 
ACATATGG AG 
TAGAAGCAGA 
TCGGAGCTAG 
AGTGGATAAC 
AACACTTGGA 
AAAGAACT AG 
TATAGAAGT A 
TAAGGCAGA A 
ATAATGAGGC 
TAAAGTGAGG 
AGCACAGATT 
TGCAAATTAT 
AATGAATTTT 
GAGAAGGAAA 
GAACGTTCTT 
AGAATCTCCT 
TTAATAGGAG 
CGTTTTAGAC 
ATCCAGTAAA 
AAACTACATT 
AGGTGATTTG 
AGGGAAAAAA 
ATTCTGAAAA 
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Lehmann AR-i (Jalicka Hi Griffiths DJt Murray JUt Uatts FZ-i 
5 McCready S-i 

Carr An-=i The radlB gene of Schizosaccharomyces pombe defines a 
new subgroup of the SflC superfamily involved in DNA 
repair- llol Cell Biol l^B Dec^lSClE) :7Qb7-flQ 

10 ^BflQlb?: 

Ilengiste Ti Revenkova E-. Bechtold N-i Paszkowski J - ^ An SMC-like 
protein 

is required for efficient homologous 

recombination in Arabidopsis- EMBO J IT'H Aug It nlfi (lb) : M5D5-15 



Peptide information for frame 1 
20 

ORF from 571 bp to 35H3 bpn peptide length: 1M1 
Category: similarity to known protein 
Classification: Nucleic acid management 
25 Prosite motifs: RGD <12b-15fi) 
ATP_GTP_A C?b-fl3) 



1 MAKRKEENFS SPKNAKRPRfl EELEDFDKDG DEDECKGTTL TAAEVGIIES 

30 51 IHLKNFMCHS MLGPFKFGSN VNFVVGNNGS GKSAVLTALI VGLGGRAVAT 

101 NRGSSLKGFV KDGGNSADIS ITLRNRGDDA FKASVYGNSI LI<2<2HISIDG 

151 SRSYKLKSAT GSVVSTRKEE LIAILDHFNI tfVDNPVSVLT EMSK<2FLf2S 

501 KNEGDKYKFF MKATflLEtfWC EDYSYIFtETK ERTKE(2IH(2G EERLTELKRd 

551 CVEKEERFtfS IAGLSTMKTN LESLKHEflAU AVVNEIEKQL NAIRDNIKIG 

35 301 EDRAARLDRK MEE(2(2VRLNE AEdKYKDI(2D KLEKISEETN ARAPECMALK 

351 ADVVAKKRAY NEAEVLYNRS LNEYKALKKD DE6LCKRIEE LKKSTDOSLE 

401 PERLERdKKI SWLKERVKAF (2N<2ENSVNt2E IE<2Fi3<2 AIEK DKEEHGKIKR 

451 EELDVKHALS YNflR(3LKELK DSKTDRLKRF GPNVPALLEA IDDA YRdGHF 

501 TYKPVGPLGA CIHLRDPELA LAIESCLKGL Lt2A YCCHNHA DERVL(2ALMK 

40 551 RFYLPGTSRP PIIVSEFRNE IYDVRHRAAY HPDFPTVLT A LEIDNAVVAN 

bOl SLIDMRGIET VLLIKNNSVA RAVNdStfKPP KNCREAFTAD GDflVFAGRYY 

b51 SSENTRPKFL SRDVDSEISD LENEVENKTA dlLNLflflHLS ALEKDIKHNE 

7D1 ELLKRCflLHY KELKNKIRKN ISEIRELENI EEH<3SVI>IAT LEDEAflENKS 

751 KMKMVEEHME <2<3KENI1EHLK SLKIEAENKY DAIKFKINtfL SELADPLKDE 

45 flOl LNLADSEVDN tSKRGKRHYEE Kt2KEHLDTLN KKKRELDMKE KELEEKMStfA 

fl51 RtflCPERIEV EKSASILDKE INRLROKIdA EHASHGDREE If1R<2Y(3EARE 

^01 TYLDLDSKVR TLKKFIKLLG EINEHRFKTY CK2FRRCLTLR CKLYFDNLLS 

^51 (3RAYCGKMNF DHKNETLSIS VflPGEGNKAA FNDNRALSGG ERSFSTVCFI 

1DD1 LSLWSIAESP FRCLDEFDVY flDMVNRRIAM DLILKNADStf RFR(2FILLTP 

50 1051 <2SNSSLPSSK LIRILRMSDP ERGtfTTLPFR PVT<2EEDDD<3 R 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamyE_llnM i frame 1 
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SUISSPR0T:RAlfl_SCHP0 DNA REPAIR PROTEIN RADlfl - i N = 1 Score = 

10E1-, P 

= 5.Se-103 

5 

PIR:S5m?0 hypothetical protein YLR3fl3w - yeast (Saccharomyces 
cerevisiae)n N = li Score = fiE3-i P = 5e-flE 



10 >SUISSPROT:RAlfl_SCHPO DNA REPAIR PROTEIN RADlfi - 

Length = 1-.1M0 

HSPs: 

15 Score = 10E1 (153. E bits)i Expect = 5.Ee-lD3i P = 5-Ee-103 
Identities = 315/10=31 (EfiV.)-, Positives = 5*40/10^1 (MT/.) 

fluery: E AKRKEENFSSPKNAKRPR12EELEDF — DKDGDEDECKGTTLTAAE 

VGIIESIHLKN 55 

20 A R + + N ++ + +E + + DG + D T T + 

VG + IE IHL N 
Sbjct: MS 

ASRN(2DNRPER(2SRLC3RSSSLIE(3VRGNE1>GENDVLN(3TRETNSNFDNRVGVIECIHLVN 1DM 
25 Query: 5b 

FnCHSNLGPXXXXXXXXXXXXXXXXXXXA VLTALIVGLGGRAVATNRGSSLKGFVKDGC2N 115 
FMCH L A+LT L + LG +A TNR ++K 

VK G+N 

Sbjct: 105 FMCHDSL- 
30 KINFGPRINFVIGHNGSGKSAILTGLTICLGAKASNTNRAPNNKSLVKC3GKN lb3 

Query- lib 

SADISITLRNRGDDAFKASVYGNSILI(2<2HISIDGSRSYKLKSATGSVVSTRKEELIAIL 175 
A IS+T+ NRG +A++ +YG SI I++ I +GS Y+L+S 

35 G+V+ST+++EL I 
Sbjct: IbU 

YARISVTISNRGFEAYdPEIYGKSITIERTIRREGSSEYRLRSFNGTVISTKRDELDNIC EE3 
Guery: 17b 

40 BHFNIt3VDNPVSVLTt2EnSK(2FL(2SKNEGl>KYKFFniCATi3LEflriKEDYSYiriETKERTKE E35 

DH +<2 + DNP+ + + LT(3 + ++C3FL + + + KY+ FMK <2L + <2++E+YS I 

+ + TK 
Sbjct: EEM 

DHHGL(3IDNPriNILT(2I>TAR(3FLGNSSPKEKY(3LFnKGIi2LK(3LEENYSLIE(3SLINTKN Efl3 

45 

fluery : E3b 

(3IH<3GEERLTELKR(3CVEKEERF(2SIAGLSTI1KTNLESLKHEnAli)A VVNEIEKtJLNAIRD ET5 
+ + ++ L ++ E + ++ + LE K EH UA V 

E+EK+L 
50 Sbjct: EflM 

VLGNKKTGVSYLAKKEEEYKLLUE(2SRETENLHNLLEfiKKGEnVUA(3VVEVEKEL 333 

fluery: 5Tb NIKIGEDRAARLDRKnEE(2(2VRLNEAE<3KYKDItf DKLEKISEETNARAP- 
ECflALKADVV 35M 

55 + E + K+ E + L DI K+ EE RA E 

K + 

Sbjct: 33T — LLAEKEF(2HAEVKLSEAKENLESIVTN<2SDIDGKISS- 
KEEVIGRAKGETDTTKSKFE 3^5 
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fluery: 355 

AKKRAYNEAEVLYNRSLNEYKALKKDDEflLCKRIEELKKSTDflSLEPERLERflKKISULK Mm 
.' + ++ Y +N+ K+D + I K D E ER 

5 ++ + 

Sbjct: 3Tb DIVKTFDG YRSEflNDVDIflKRDIflN 

SINAAKSCLDVYREflLNTERARENNLGG 443 

fluery: 415 ERVKAFflNflENSVNflEIEflF-flflAIEKDKE EHG 

10 KIKREELDVKHALS 4bO 

+++ N+ N++ +EI +fl +E + + EG + ++ 

+ + +S 
Sbjct: 

SfllEKRANESNNLflREIADLSEfllVELESKRNDLHSALLEflGGNLTSLLTKKDSIANKIS 506 

15 

fluery: 4bl 

YNflRflLKELKDSKTDRLKRFGPNVPALLEAIDDAYRflGHFTYKPVGPLGACIHLRDPELA 5E0 

LK L+D + D++ FG N+P LL+ I R+ F + P GP+G + 

+++ +• 

20 Sbjct: SOI DflSEHLKVLEDVflRDKVSAFGKNUPflLLKLIT 

RETRFflHPPKGPIIGKYMTVKEflKlilH 5b5 

Query: SSI 

LAIESCLKGLLflA YCCHNHADERVLflALMKRFYLPGTSRPPIIVSEFRNEIYDVRHRAAY 5S0 
25 L IE L ++ + +H D+ +L+ LI1++ T ++V + 

YD ++ 

Sbjct: 5bb LIIERILGNVINGFIVRSHHDflLILKELMRflSNCHAT VVVGK 

YDPFDYSSG bib 

fluery: Sal HPD — 

FPTVLTALEIDNAVVANSLIDMRGIETVLLIKNNSVARAVflflSQKPPKNCREAFT b3S 

PD +PTVL ++ D+ V + + LI+ GIE +LLI + + A A f1+ + 

N + + 

Sbjct: bl7 EPDSflYPTVLKIIKFDDDEVLHTLINHLGIEKMLLIEDRREAEAYMK — 
RGIANVTflCYA b?4 

fluery: 1,31 ADG-DflVFAGRYYSSENTR — PKFLSRDVDSEI 

SDLENEVENKTAfllLNLflflHLSAL b^2 

D ++ + R S++ + K + I S EEK L 
40 fl + ++ 

Sbjct: b?5 

LDPRNRGYGFRIVSTflRSSGISKVTPUNRPPRIGFSSSTSIEAEKKILDDLKKflYNFASN 734 

fluery: b<53 E-OIKHNEELLKRCflLHYKELKMKIRKNIS-EIRELENIEEHfl-SV-D 

45 IATLEDEA 74S 

+ + K + KR + E I+K I + RE+ ++E + SV D 

I TLE 

Sbjct: 735 

ALNEAKIEflAKFKRDEflLLVEKIEGIKKRILLKRREVNSLESQELSVLDTEKIflTLERRI 714 

50 

fluery: 74b flENKSKIUCMVEEHMEflflKENMEH- 
LKSLKIEAENKYDAIKFKINflLSELADPLKDELN-L 603 

E + +++ ++ K N EH ++ + + + KI ++ 

L+ EL+ L 

55 Sbjct: 715 SETEKELESYAGflLflDAK- 

NEEHRIRDNflRPVIEEIRIYREKIflTETflRLSSLflTELSRL 653 



30 



35 



-60- 



WO 01/98454 PCT/IBO 1/02050 

Query* acm 

ADSEVDNt3KRGKRHYEEK(3KEHLI>TLNXXXXXXXXXXXXXXXXXSt3AR(3ICPERIEVEKS fib3 
D + +++ +RH + + + L ++A C 

ER + V+ S 

5 Sbjct : ASM R]>EKRNSEVI>IERH-R(2TVESCTNILREKEAICKV(3CA(2VVADYTAKANTRC- 
ERVPVdLS ^11 

fluery: fibM ASILDKEINRLRflKItfAEHASHG- 
DREEIMRflYdEARETYLDLDSKVRTLKKFIKLLGEI ^22 
10 + LD EI RL+ +1 G E+ Y A + E + V L + 

++ L E 

Sbjct: ^12 

PAELDNEIERL(3f1t3IAEURNRTGVSVE(3AAEPYLNAKEKHD(3AKVLVARLTt3LL(3ALEET ^71 
15 fluery: ^23 

MEHRFKTY(2(3FRRCLTLRCKLYFDNLLSc2RAYCGKI1NFDHKNETLSISV(3P6EGNK:a-AF TBI 
+ R + + +FR + +TLR K F+ LSdR + GK+ H+ E L VP 

N A A 

Sbjct: ^72 

20 LRRRNEMUTKFRKLITLRTKELFEL YLS(3RNFT6KL VIKH(3EEFLEPRVYPANRNLATAH 1031 
fluery: ^B2 N 

DflRALSGGERSFSTVCFILSLUSIAESPFRCLDEFDVYnDnVWRRIAflDLIL 103M 

N ++ LSGGE+SF+T+C +LS+W P RCLDEFDV+MD VNR 

25 +++ +++ 

Sbjct: 1032 

NRHEKSKVSVfiGLSGGEKSFATICnLLSIUEAnSCPLRCLDEFDVFriPAVNRLVSIKnMV 10<?1 

(3uery: 1D35 KMADS(2RFR«3FILLTP(2SI1SSLPSSK:LIRILRriSDPERG(2TTLP 107B 
30 A + <2FI +TP(2 H + K + + R+SDP + LP 

Sbjct: 10^2 DSAKPSSDIC(3FIFITP(31>n6(2I6LI>KDVVVFRLSDPVVSSSALP 1135 
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Pedant information for DKFZphamy2_llnM -i frame 1 
Report for DKFZphamy2_llnH . 1 



40 CLENGTH3 1D11 

EMIO 12t.32b.13 

CpI3 b-S7 

CH0H0L3 SWISSPROT:RAia_SCHPO DNA REPAIR PROTEIN RADlfi- le- 

10*1 

45 CFUNCAT3 03-1^ recombination and dna repair CS* cerevisiaen 

YLR363w3 le-afl 

EFUNCATID OB. 07 vesicular transport (golgi network-i etc) ES- 
cerevisiaei YDLOSBuO 3e-lb 

CFUNCAT3 30-03 organization of cytoplasm (US- cerevisiaen 

50 YDL05Bw3 3e-lb 

EFUNCATJ 0*1.13 biogenesis of chromosome structure ES. 
cerevisiae-i YLR0abw3 2e-m 

CFUNCAT3 1 genome replications transcriptions recombination and 

repair EH- jannaschii-i MJlfci^ID 3e-m 

55 CFUWCATHl 30. 0*4 organization of cytoskeleton ES- cerevisiae-» 

YIL14^c3 le-12 

CFUNCATJ 03-22 cell cycle control and mitosis CS- cerevisiaen 

YDR3StiWlD ae-12 
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EFUNCATJ CH-IO nuclear biogenesis 



PCT/IB01/02050 
ES. cerevisiaei YDR35fawJ 

Ae-12 
EFUNCATJ 
3e-ll 

EFUNCATJ 11. 0M dna repair (direct repair-, base excision repair 
and nucleotide excision repair) ICS- cerevisiae-. YKROTSwJ Be-D^ 



30-10 nuclear organization 



ES- cerevisiae-. YFLDOAwJ 



EFUNCATJ 
5e-0T 
EFUNCATJ 
myosin-1 
EFUNCATJ 
ES- 

EFUNCATJ 



11 unclassified proteins ES- cerevisiae-i Y0R21facJ 

ES. cerevisiae-. YHRDEBw NY01 - 



03-25 cytokinesis 
isoformJ Ae-OA 

□3-DH budding-, cell polarity and filament formation 
cerevisiasi YHR023w I1Y01 - myosin-1 isoformJ Ae-OA 
OA-22 cytoskeleton-dependent transport ES- cerevisiae-. 
YHRD23w MY01 - myosin-1 isoformJ Ae-OA 

EFUNCATJ 0L..D7 protein modification (glycolsylation-. acylation-i 
myristylation-. palmitylation f arnesylation and processing) 

ES- cerevisiae-i YKL201cJ 2e-07 
EFUNCATJ 03-13 meiosis ES. cerevisiae-. YDR2A5wJ Me-07 
EFUNCATJ 3D- 13 organization of chromosome structure ES. 
cerevisiae-. YDR2A5wJ Me-07 

EFUNCATJ TA classification not yet clear-cut ES- cerevisiae-. 

YJR13HcJ 7e-07 

EFUNCATJ 0b-10 assembly of protein complexes ES - cerevisiae-i 

YPRlMlcJ 7e-07 

EFUNCATJ 3D. OS organization of centrosome ES . cerevisiae-i 
YPRlMlcJ 7e-07 

EFUNCATJ 11-01 stress response ES - cerevisiae-* YPRlHlcJ 7e-07 
EFUNCATJ Q3-07 pheromone response-i mating-type determination-! 
sex-specific proteins ES- cerevisiae-. YPRlMlcJ 7e-07 
EFUNCATJ r general function prediction EH. influenzae-. 

HI075bJ le-Ofa 

EFUNCATJ 10-05- ^ other pheromone response activities ES. 
cerevisiae-, YHR15AcJ 2e-0b 

EFUNCATJ 05-0*4 translation (initiation-, elongation and 
termination) ES- cerevisiae-. YAL035wJ 3e-0M 
EFUNCATJ 30-02 organization of plasma membrane 



YEROOAcJ 
EFUNCATJ 
YEROOAcJ 
EFUNCATJ 
YKL17^cJ 
EFUNCATJ 



Me-QM 
DA.lt 

Me-OM 
O^.DM 

7e-0M 
03-22. 



extracellular transport 
biogenesis of cytoskeleton 



ES . cerevisiae-. 



ES- cerevisiae-. 



ES • cerevisiae-. 



01 cell cycle check point proteins 
cerevisiae-. YGLOAbwJ 7e-0i* 

EFUNCATJ OA-01 nuclear transport ES- cerevisiae-. 
EFUNCATJ OM-07 rna transport ES. cerevisiae-. 

EBLOCKSJ BL0032bC Tropomyosins proteins 
EBLOCKSJ PROIOOMB 

EBLOCKSJ BL00121A Colipase proteins 
EBLOCKSJ PF0D5A0A 

ESCOPJ d2tmab_ 1-105.4.1.1 Tropomyosin Erabbit 

(Oryctolagus cuniculus) 3e-0b 

EECJ 3. fa- 1-32 Myosin ATPase ^e-20 

EPIRKUJ phosphotransferase Te-lfa 

EPIRKWJ nucleus 2e-10 

EPIRKWJ blocked amino end 2e-07 

EPIRKWJ citrulline 2e-10 

EPIRKWJ tandem repeat ^-20 

EPIRKWJ heterodimer 3e-ll 



ES- 

YDL207w J 0-001 
YDL207wJ 0-001 
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EPIRKIO endocytosis 2e-13 

EPIRKIO heart Te-ED 

EPIRKIO polymorphism le-10 

EPIRKliO serine/threonine-specif ic protein kinase Te-lb 

5 lEPIRKUID transmembrane protein fie-15 

EPIRKIO zinc finger 2e-13 

EPIRKUO metal binding 2e-13 

EPIRKUO DNA binding 2e-0b 

EPIRKIO muscle contraction ^e-20 

10 EPIRKIO acetylated amino end 3e-13 

EPIRKIO actin binding ^e-20 

EPIRKIO mitosis fie-10 

EPIRKIO microtubule binding 3e-0^ 

EPIRKIO chromosomal protein 3e-ll 

15 EPIRKIO ATP =16-20 

EPIRKW3 receptor 2e-0b 

EPIRKUJ thick filament Te-20 

EPIRKIO phosphoprotein 2e-14 

EPIRKIO glycoprotein le-10 

20 EPIRKIO skeletal muscle le-l& 

EPIRKIO calcium binding 2e-10 

EPIRKIO alternative splicing 3e-12 

EPIRKIO DNA condensation 3e-ll 

EPIRKIO P-loop ^e-2a 

25 EPIRKIO coiled coil Te-2D 

EPIRKIO heptad repeat le-10 

EPIRKIO methylated amino acid ^e-20 

EPIRKIO basement membrane le-10 

EPIRKIO immunoglobulin receptor Me-0 1 ^ 

30 EPIRKIO peripheral membrane protein 2e-13 

EPIRKIO cardiac muscle Te-20 

EPIRKIO extracellular matrix le-10 

EPIRKIO hydrolase C ie-2D 

EPIRKIO microtubule 2e-10 

35 EPIRKIO muscle 2e-14 

EPIRKIO membrane protein le-10 

EPIRKIO EF hand 2e-10 

EPIRKIO cell division Be-10 

EPIRKIO cytoskeleton le-13 

40 EPIRKIO hair 2e-lD 

EPIRKIO calmodulin binding 2e-13 

EPIRKtO Golgi apparatus be-Ofi 

EPIRKIO smooth muscle He-07 

ESUPFAH2 conserved hypothetical P115 protein 4e-2b 

45 ESUPFAPU myosin heavy chain Te-20 

ESUPFAPD unassigned Ser/Thr or Tyr-specific protein . kinases ^e- 
lt 

ESUPFAPO centromere protein E 3e-0T 

ESUPFAIU calmodulin repeat homology 2e-lD 

50 ESUPFAMJ alpha-actinin act in-binding domain homology ?e-07 

ESUPFAMJ myosin motor domain homology Te-20 

ESUPFAMJ tropomyosin Se-Ofl 

ESUPFAMJ plectin ?e-07 

ESUPFAMJ pleckstrin repeat homology 3e-QT 

55 ESUPFAMJ trichohyalin 2e-10 

ESUPFAMJ hypothetical protein MJ1322 2e-0b 

ESUPFAMJ ribosomal protein S10 homology 7e-07 

ESUPFAMJ protein kinase C zinc-binding repeat homology Se-OT 
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lESUPFAPU giantin Te-12 

ESUPFAM3 protein kinase homology le-lb 
ESUPFAM3 kinesin motor domain homology Be-DI 
CSUPFAM3 human early endosome antigen 1 5e-13 
5 CSUPFAM3 M5 protein Me-OI 

CSUPFAH3 cytoskeletal keratin Se-Ob 
EPR0SITE3 ATP_GTP_A 1 
EPR0SITE3 RGD 1 
IKU3 All_Alpha 
10 EKU3 L0W_C0I1PLEXITY 3-3Q '/. 

CKW3 C0ILED_C0IL 15-15 '/. 

SE(3 MAKRKEENFSSPKNAKRPRflEELEDFDOGDEDECKGTTLTAAEVGIIESIHLKNFIICHS 

15 SEG 

PRD ccchhhhhcccccccccchhhhhhcccccccccccccccccccccceeeeehhhhhhccc 
COILS 

20 SE(3 MLGPFKFGSNVNFVVGNNGSGKSA VLTALIVGLGGRAVATNRGSSLKGFVKDGflNSADIS 

SEG . • . - xxxxxxxxxxxxxxxxxxx 

PRD ccccccccceeeeeeccccccchhhhhhhhhhcccccccccccccceeeecccccceeee 
COILS 

25 

SE<J ITLRNRGDDAFKASVYGNSILIflflHISIDGSRSYKLKSATGSVVSTRKEELIAILDHFNI 

SEG - 

PRD eeeecccccccccccccccccchhhhhccccceeeeccccchhhhhhhhhhhhhhhhhhh 
COILS 

30 - 

SE(3 i2VDNPVSVLT(3EF1SK(}FL(3SKNEGDKYKFFMKAT(3LEi3MlCEDYSYIt1ETKERTKE«3IH(2G 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
35 COILS 

SE<3 EERLTELKRflCVEKEERFfiSIAGLSTMKTNLESLKHEMAUIA VVNEIEKflLNAIRDNIKIG 

SEG 

40 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

• CCCCCCCCCCCCCCCC 

SE<3 EDRAARLDRKHEE(3(2VRLNEAEi3KYKDIflDKLEKISEETNARAPECriALKAD VVAKKRAY 

45 SEG - 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

50 SE(2 NEAEVLYNRSLNEYKALKKDDEflLCKRIEELKKSTDflSLEPERLERGKKISWLKERVKAF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

55 

SE<2 <2NflENSVNc2EIE<2F<2<MIEKDKEEH6KIKREELDVKHALSYN<2R<2LKELKDSKTDRLKRF 

SEG • 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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20 



45 



50 



WO 01/98454 PC77IB01/02050 
COILS 

SE<2 GPNVPALLEAIDDAYR<2GHFTYKPVGPLGACIHLRDPELALAIESCLKGLL(2AYCCHNHA 
5 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

10 SEC DERVLOALMKRFYLPGTSRPPIIVSEFRNEIYDVRHRAAYHPDFPTVLTALEIDNAVVAN 
SEG • •. 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhh 
COILS 



SE(3 SLIDMRGIETVLLIKNNSVARAVMflSiSKPPKNCREAFTADGDflVFAGRYYSSENTRPKFL 
SEG • 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 
COILS 



SEC SRDVDSEISDLENEVENKTA(2ILNL<3<2HLSALEKDIKHNEELLKRC(3LHYKELKnKIRKN 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
25 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SE<2 ISEIRELENIEEHflSVI>IATLEDEA(2ENKSKriKnVEEHI1EtSt2KENHEHLKSLKIEAENKY 

SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 
COILS 

• • • -CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEfl DAIKFKINcJLSELADPLKDELNLADSEVDNflKRGKRHYEEKflKEHLDTLNKKKRELDIIKE 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCC 

40 SEC KELEEKMSOAR<2ICPERIEVEKSASILDKEINRLR(2KI<2AEHASHGDREEiriR<2YfiEARE 
SEG xxxxxxx • 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCC 



SEfl TYLDLDSKVRTLKKFIKLLGEIf1EHRFKTY<3<2FRRCLTLRCKLYFDNLLS<2RAYCGKf1NF 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccceee 
COILS 



SE<2 DHKNETLSISVflPGEGNKAAFNDMRALSGGERSFSTVCFILSLUISIAESPFRCLDEFDVY 
SEG 

PRD eccccccceeeeccccchhhhhhcccccccchhhhhhhhhhhhhhhhccccchhhhhhhh 
55 COILS 



SEtf nDMVNRRIAI1DLILKriADS(?RFRC2FILLTPi3SnSSLPSSKLIRILRnSDPERG(3TTLPFR 
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SEC - r 

PRD hhhhhhhhhhhhhhhhhhhhhhceeeeeeccccccccccceeeeeecccccccccccccc 
COILS 

5 * • 

SE0 PVTflEEDDDGR 

SEG 

PRD chhhhhhhccc 
COILS 

10 

Prosite for »KFZphamyE_llnH .1 

15 PSODOlb 15b->lBT RGD PDOCODDlb 

PS0DD1? 7L>->fl4 ATP_GTP_A PD0CDDD1? 



(No Pfam data available for DKFZphamy2_llnM • 1) 

20 
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DKFZphamy£_lElfn 



5 group: cell structure and motility 

DKFZphamyE_JLElf M encodes a novel ESI amino acid protein with 
high similarity to a Rat ankyrin binding glycoprotein-1 related 
mRNA • 

10 

Ankyrin binding glycoproteins play a role in neural cell adhesion 
and in prosate tumor cell transformation. DKFZphamyE_lElf n - p3 is 
expressed in braini uterus and prostate above average- 

15 The new protein can find application modulation of cyto skeleton- 
membrane interactions- 



similarity to ankyrin binding glycoprotein-1 related mRNA (Rattus 
20 norvegicus) 

Sequenced by DKFZ 

Locus: /map="l" 

25 

Insert length: IHTfl bp 

Poly A stretch at pos- 147^ polyadenylat ion signal at pos. mtO 



30 1 CGGCACCTTC GCCGGCGCCC TCGCCCACCC 

SI CAGCCCCCCG CGGAGACCCC TACAGACGCT 
101 AGCCCCTGCT CCCCCGGTGA CCCCTAGCAA 
151 ACCGAGAAGA AGCCACTCGG CTCTTGGCTG 
EDI GAGCAGCGGG AGCGCGAGGA GCAGGAGCGG 
35 ESI CAAGCGAATG CGAGAGGAGC AGCTGGCACG 

301 AGCGGGAGGC GGAGGCCCGG AGGCGGGAGG 
351 GCGCAGGCCG AGCAGGAGGA GCAGGAGCGG 
M01 GGCCGAAGCT CGGTCGCGGG AAGAGGCGGA 
MSI AAAAGCACTT CCAGCAGCAG GAGCAAGAGC 
40 501 CTGGAGGAGA TCATGAAGAG GACTCGGAAG 

551 GAAGCAGGAC AGCAAGGAGG CCAACGCCAA 
bDl TGAAAGCTGT GGAGGCTCGG TCCCCAGGGC 
t51 AAAGAGGAGC CCATCCCACA GGAGCCTCAG 
701 GTTGCCAGCG TCCCTGGTGA ATGGCCTGCA 
45 751 AGAATGGCTT CTCCACCAAC GGACCCTCTG 

fiOl ACACCAGAGA CACTCCTGCC CTTTGCAGAG 
flSl AGCTGTGGTG CAGTCCCCGC AGGTCACAGA 
"=501 CTTGGATCCG GGCACAGTTG TGAGGGCTCC 
^51 GTCTGGAGGA GAAAAAGACA GAACAAAGAT 
50 1D01 GGGGTGGGTC CTCTCTGTTG TTTTTAATCT 
1051 TCTTTGGCCG GAGCCAGATC TGCCCCTCAG 
11D1 CGCAGACATC CCTTCTCCCC CATACACACA 
1151 GGCCTCTTCC CTTGGGGAGG GGCCACCTGT 
1E01 GGGGTACAGT GGATGTGAAT ACTGTAAATA 
55 1E51 GCGTGGAGAG GGTGGGTGCA GGAGGCAGAC 
13D1 GGGAGATCTT CCTCTCTCTA TTTAACTGTA 
1351 TGGGGATGGG GGACACCTTG GGCCACAGGA 
mOl CCCATGCCCC CTGCCCTCGC CTGGAATCAG 
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CAGCCCCGCC 
GCTGTCTTGA 
ACCAATGGCC 
AGAAGCGGCG 
AGGCTGCAGG 
GGAGGCCGAG 
AGCAGGAGGC 
CTGCAGAAGC 
GCGGCAGCGT 
GGCAAGAGCG 
TCAGAAGTTT 
CGGTTCCAGC 
TGCAGAAGGA 
TGGAGTCTCC 
GCCTCTCCCA 
GGGACAAGAG 
GCAGAAGCCT 
AGTCCTTTAA 
TCTGCATCAC 
GGAAGTGGCC 
GCACCTTATA 
TGCATTCGTG 
TATACACTCA 
AGTATTTGCC 
GCTTGTGCTC 
CCTCCCCCCA 
ACTGAGGGGG 
TACTGGTTGC 
TGTTACTGCA 



CCAGAAGGAG 
CCTCACCCCC 
GGCACCACAG 
CCAGGCCCGG 
CAGAAAGGGA 
GCCCGGGCGG 
ACGAGAGAAG 
AGAAAGAGGA 
CTGGAGCGGG 
CAGAAAGCGT 
CTGAAACCAA 
CCAGAGCCTG 
GGCTGTGCAG 
CAAGCAAGGA 
GCACACCAGG 
TCTGAGCCGA 
TCCTCAAGAA 
GAGGGTTTGC 
CTACCAGGAT 
TGGGCCCCTG 
GACTGATGTC 
TGCTCGCACG 
CAGCCTCTCT 
TTGATTTGGT 
AGACTCCTCT 
AAGCCCCCTG 
ATCCCAGGTC 
TTCAGGGGTA 
TCTGATTAAA 



WO 01/98454 PCT/1B01/02050 
1MS1 TGTCTCCAGA AATAAAG AAT AATTCTGCCA AAAAAAAAAA AAAAAAAA 

BLAST Results 

5 

No BLAST result 
10 lledline entries 

No Medline entry 

15 

Peptide information for frame 3 

20 ORF from 135 bp to &&7 bp, peptide length: 251 
Category: putative protein 

Classification: Cell signaling/communication 

1 (1AGTTDREEA TRLLAEKRR<2 ARE<2REREE<3 ERRL(JAERBK RHREEC3LARE 
25 SI AEARAEREAE ARRREEQEAR EKAflAEflEEfl ERLCKOKEEA EARSREEAER 

1D1 (3RLEREKHFU fll2E(2ER(2ERR KRLEEIMKRT RKSEVSETKK QBSKEANANG 
151 SSPEPVKAVE ARSPGLOKEA VflKEEPIPflE PfllilSLPSKEL PASLVNGL(3P 
2D1 LPAHflENGFS TNGPSGDKSL SRTPETLLPF AEAEAFLKKA VV(3SP(3VTEV 
551 L 

30 

BLASTP hits 

35 No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_121f IT-, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphamy2_121f IT-i frame 3 



40 



45 



Report for DKFZphamy2_121f IT - 3 



ILENGTH3 

OU1 33517. =}t> 

Ipll S-bl 

50 CH0I10L3 TREMBLNEIi): ABQ33Q13_1 gene: "KIAA11S7", product 

"KIAAllfl? protein"^ Homo sapiens mRNA for KIAAllfl? proteini 
partial cds- le-b 1 * 

CBL0CKS3 PFOimOD 

CBL0CKS3 BL00m21> Neuromodulin (GAP-H3) proteins 

55 IBLOCKSJ BLDUfl2bC 

IBLOCKSJ BL0DM22C Granins proteins 

EBL0CKS3 PRD01fe>7C 

IEBL0CKS J PFDDT e J2A 
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IBLOCKSJ BL00E2HB Clathrin light chain proteins 

CBL0CKS3 PR000M1D 

EBLOCKSJ PROOIIQA 

CKUI All_Alpha 

5 CKU1 LOW_C0MPLEXITY 

EKIO COILE1>_COIL 10.51 X 

SE(2 APSPAPSPTPAPPflKEflPPAETPTDAAVLTSPPAPAPPVTPSKPMAGTTDREEATRLLAE 

10 SEG XXXXXXXXXXXXXXXXXXXXXXX . . XXXXXXXXXXXXXXXXXXXXX .-XX 

PRD cccccccccccccccccccccccccceeeeccccccccccccccccccchhhhhhhhhhh 



COILS 



15 SE<5 KRR<2ARE<2REREE<3ERRL<3AERDKRt1REE<2LAREAEARAEREAEARRREE£2EAREKA(2AE 

SEG XXXXXXXXXXXXXXXXXXXXXX- • • • XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCC 

20 

SE<2 <2EE<JERL<3K(2KEEAEARSREEAER<2RLEREKHF(2c2<2E(2ER<2ERRKRLEEinKRTRKSEVS 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 
COILS 

25 CCCCCCCCCCCCCCCCCC - 

SE<2 ETKK<JDSKEANANGSSPEPVKAVEARSPGL<3KEAVi2ICEEPIP(2EP<2liJSLPSICELPASLVN 

SEG - • 

PRD chhhhhhhhhccccccccceeeeccccccchhhhhhhhcccccccccccccccccceeee 
30 COILS 

SEfl GL«2PLPAH<2ENGFSTNGPSGDKSLSRTPETLLPFAEAEAFLKKAVVflSP(2VTEVL 

SEG - 

35 PRD ecccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccc 

COILS - • • • 

(No Prosite data available for DKFZphamy2_151f n • 3 ) 

40 

(No Pfam data available for DKFZphamyS_151f IT .3) 
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MFZphamy3_151mE 



5 group: cell cycle 



10 



15 



DKFZphamyE_lElmE encodes a novel MflO amino acid protein with 
similarity to human PAEb-T2 protein- 

PAEb-TE is a p53 responsive gene- The protein is predominantly 
expressed in brain-i breast and kidney and may represent a 
potential novel regulator of cellular growth- Isoforms are 
differentially induced by genotoxic stress (UVi gamma-irradiation 
and cytotoxic drugs)in a pS3-dependent manner- 

The new protein can find application in modulating cell division 
and apoptosis pathways- 



20 



25 



30 



35 



40 



45 



50 



55 



similarity to PAEb nuclear protein isoforms (Homo sapiens) 

probably differential polyadenylation 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 33E7 bp 

Poly A stretch at pos- 330b-i polyadenylation signal at pos- 327^ 



1 TCCAGCACCA 
SI AGAGACGCCC 
101 TCATTCGGGC 
151 GCCGCGACCG 
501 CGCGCAGAGC 
ESI CTCGGGCCCC 
3D1 GA6GGCCCAG 
3S1 GAGAGCCTCG 
M01 AGTAGACAAC 
MSI GCTTCTGGCG 
501 AGCTCCTGGC 
SSI TTCTTACCTG 
bOl ACCCTGAGTG 
b51 CTCAGCGAGA 
701 GGAACACATC 
7S1 CCGAGCTCAT 
SOI TCCTTCGTGT 
SSI CCCTGCCCCC 
"501 GCAG6GACCC 
151 GAGGCGCTGA 
1001 TGAGGGGACG 
1051 CAGAGAGCCT 
1101 CACCCAGACA 
11S1 CTTCACTCGG 
1E01 ATACCTGGGA 
15S1 GGTGGGCAGC 
1301 CAATACCATC 



AAGCGGCCGT 
CGGGGTTCTA 
GTCCCTCCGA 
GGTCCTGGCG 
TCAAGGACTA 
GGAGAGGAGC 
CGCCTTCATC 
AGCAGCACCT 
CTGGCAGTGG 
CCTGCACTAC 
GCCACTACAT 
GTAGGCTCCC 
GCTGCTGGGC 
TCAACAAGTT 
CAGGCCTTGC 
TCAGGCTCTG 
TTG6CTGTGG 
CAGGCACCTA 
GTTGAACAAC 
TGGAGCGCAT 
TCCCAGGAGG 
GCT6GTGACC 
TGCTGTGCTT 
AGAGGGGCTC 
AGACCATGGC 
TGCTGGATGA 
GCCATGCACA 



TCTCGGATTC 
GAAGCTCCCC 
AACCCACTCG 
CGCACCATGA 
CCTGCGGTTC 
AGAGGGAGAG 
CCCGTGGAGG 
GGGGCTGGAG 
TGATGGGCCT 
CTGCTGCTGC 
TGCCATCATG 
ACATGGCCGA 
CTCCACCGGG 
GCTGGCGCAT 
TGAAGACCGG 
GTCCTGCTCA 
CATCCTCCCT 
CACCCCCTAG 
TCTGGGGGCT 
GCAGCAGCTG 
AGATGGAGAG 
CCCTCAGCTG 
TGTGGAAGAC 
AGGCACCCCC 
TACTCGCTGA 
GAAGTTCCAG 
GTGGTGTGGA 



CGGAGCGTTC 
GGCGGCGCCC 
GGTGCACGGG 
TCGTGGCGGA 
GCCCCGGGCG 
CCGGGCTCGG 
AGGTCCTTCG 
GCACTGATGT 
GCACCCTGAC 
ACACGGATGG 
GCT6CCGCCC 
GTTTCTGCAG 
CCCCCGAGAA 
CGGCCATGGC 
CGAGCACACT 
CCCACTGCCA 
GAGGGGGATG 
TGAACAGAGC 
TTGAGTCTGC 
CAGGAGAGCC 
CCGCTTTGAG 
ACATCCTG6A 
CCTACTTTCG 
TACCTTCCGG 
TCCAGCGGCT 
GCAGCCTATA 
CACCTCCGTG 



TGGAGCCCCG 
AGTCCCGGCT 
TCGTCGGCGA 
CTCCGAGTGG 
GCGTCGGCGA 
CGAGGCCCTC 
GGA6GGGGCT 
CCTCTGGGCG 
TACTTTACCA 
TCCCTTGGCC 
GCCATCAGTG 
ACTGGTGGTG 
GCTGCGCAAA 
TCATCACCAA 
TGGTCCCTGG 
CTCGCTCTCC 
CAGATGGCAG 
AGCCCCCCAA 
CCGCGACGTG 
TGCTGCGGGA 
CTGGAGAAGT 
GCCCTCTCCA 
GATATGAGGA 
GCCCAGGATT 
TTACCCTGAG 
GCCTCACCTA 
CTCCGCAGGG 
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20 



25 



30. 



35 



40 



1351 
IMQl 
1M51 
1501 
1551 
lbDl 
IbSl 
1701 
1751 
1B01 
1SS1 

noi 
nsi 

5001 
EDS1 
S1D1 
H151 
ESDI 
SE51 
E3D1 
S3S1 
£401 
E451 
ESDI 
ES51 
EbDl 
EbSl 
S701 
E751 
EfiOl 
ES51 
£101 
£151 
3001 
3051 
31D1 
3151 
3E01 
3E51 
3301 



CCATCTGGAA 
GATTATGGGG 
CAAGACAGTG 
TCTTCTGGAG 
CTCCTGGAGG 
CACCCGCTAC 
CCCCACAAGG 
CATGCCCACC 
CGAAGCCACA 
GACTCTGGGA 
TCCTAAGGGA 
CACAGGAAAG 
TGGCCTTCCT 
AAATGCCTCC 
CATTTCCAGA 
AGTCAGGGTC 
GTTGGGCAGC 
CCTTTGCCCT 
AGAGAGGCCA 
TTTGAGCCTT 
TTATAAATCC 
GCCACAGTGT 
CTGCACCTGC 
CCTGCCGCA6 
GGCACCATGG 
ACCCTGCCCC 
CTGACTGCCA 
TCTGAAGCTG 
TTTTCTTGCA 
GCCTTCTGAG 
CTGGGGCTTA 
CAGGCACTTT 
CTTCTAGCAG 
AAGGCAAGGG 
GTGTGCCACA 
CTTCCGGCCT 
CCCCGGCATG 
TCACTGGGTG 
GGCTTTGCTA 
TGTTTCATAA 



CTATATCCAC 
AGGTGAACCA 
GCCTGCTACC 
GCACTTCCGC 
CGCGCATGCA 
AT6ACCTGAC 
ACTTCTCTGT 
CTCCCCACGC 
CCCTCCCTTT 
TCTCAGCCCT 
CCACACCCTT 
AAGCCGGGCC 
GAACTGGGAA 
GGGACTGACA 
TTTCATTACC 
ACAGCTGGTC 
CTGAGGCTGT 
TTTTCCCTTT 
AGTACATAAA 
TGCTGGTCAC 
TCTTTATTTT 
GTGAGAGGAG 
CTCGCAGAGG 
ATGTCTCCCA 
CTCAGCAGGA 
TGGGCCATGG 
CAGCT6CAGA 
CCCCTGGGAT 
AGATCAGGGA 
GACTCCCACC 
AGTGGGTT6C 
CTGTAGCAAA 
TCTGTGCCTC 
CCGTGCTGCT 
TTAAATACCC 
GAAAGCCCTC 
GGGATCT6GG 
GGTAGGGAGT 
CCAGTTCCAT 
AAAAAAAAAA 



TGCGTCTTTG 
GCTCCTGGAG 
CAGAGAAGAC 
CACTCAGAGA 
AGCCGCTCTG 
TCCTGAGCAG 
CTGGAGACAG 
TGCAGTGGGC 
TCCTCACTGG 
GCTCCTGGGA 
CCTCCTTCCC 
AAGCTCGGAA 
GTCCCTGGCT 
CTCCAGGCAG 
TCCTACTTGC 
TGTGTGTCCA 
TGCCCGAATC 
TCTCCATGCT 
AAAAAAAAAA 
ATTGCCTTCT 
GGTCCTTTAT 
GAGGAGAGGG 
CCAGCACCCC 
AAAAGTTGAG 
GGGGCGGGAG 
CCAGGTGACC 
CTGAGAGGGT 
TCTCAGGCCA 
CCCCATTTCT 
CCCATCCCAG 
TTCCAGGCAG 
TGACTGTGAA 
CTCTCTGACC 
GCTGGGCGGG 
GTGCAGGCGC 
CCTGCAAGAA 
TTCTAGAGGG 
GGTATCCAGT 
ATGATGAGAA 
AAAAAAA 



GCATCAGATA 
CGGAACCTC A 
CACCCGAAGA 
AGGTCCACGT 
CTGTACGCCC 
GACCTGGGCC 
CCCCAGACCC 
TTGTGTGTGA 
AATGGACAGT 
GCTGGAAGAG 
CTGCCCACAG 
TTAATGTGCC 
GGCCCCCGGG 
CTTTGCCTT C 
CATTCACCCA 
GTTCCCTAA A 
CTAGTTCAGT 
TAATGGTGTG 
AGCAGATTAT 
GAAGAGGAGG 
GCTTGAGG-TT 
AGAATTCTGT 
ACTCTCCTGC 
CCTTTCTAGA 
GCACCAGGGT 
ATGGCTACAT 
GGGTCTGAGT 
ACCTGCCAAC 
GCAGCCAGTG 
TATCTCATCT 
AAGCAGCCA A 
TTACGACTTC 
AGTTTGGAGG 
GCAGGAGAGG 
GGAGAAGCAA 
GGTGTGC AGG 
CAT6TGATGA 
GTTCAAGTGC 
AT AAACGTTC 



TGATGACTAT 
AGGTCTATAT 
ATGTACAACC 
GAACTTGCTG 
TCCGTGCC AT 
CGGTTCAGCT 
TTTTGTGTCC 
TGTGCAGTCC 
TCATTGCACT 
CACTTGGAGA 
AGGCAGAGGG 
ACAAGTGTTG 
GGAGAGGGGC 
TCTCCCCTGT 
TCAATGTGAA 
AGCCTGTTCT 
TTTTTGACTT 
AGGCGTCAGG 
CTCTAGAGAG 
GAGTATTAGA 
CCAACCTGG A 
TCTCCCAGAG 
CTCCAGTGGC 
TGGCTTAGGT 
TCTTGTTTGG 
T6CCAAACCT 
CCCCACAATG 
AGCAAGCGG A 
TCTCCTGGGT 
GTCCCCTCTC 
GGACCGATTC 
TCTTGCCCTT 
GCACT6AA6A 
AGCCTGGCCA 
CCGGCACCCC 
AGAGAAGAGG 
CTGTAAATGT 
AGAAATCTTT 
GCTGAGGTTT 
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BLAST Results 



No BLAST result 



50 



Medline entries 



55 



^50Em70: 

Buckbinder L-i Talbott R.-i Seizinger B-R.-, Kley N-n Gene 
regulation by 

temperature-sensitive p53 mutants: identification of p53 response 
genes. Proc Natl- Acad. Sci- U-S-A. 11 (EE ): lQbM0-lDb4M (1114 ) . 



11E4117: 



-71- 



WO 01/98454 PCT7IB01/02050 

Velasco-Higuel S-i Buckbinder L-» Jean P-i Gelbert Talbott R-, 
Laidlaw 

Jt Seizinger B-i Kley N-; PABb-, a novel target of the p53 tumor 
suppressor and member of the GADD 
5 family of DNA damage and growth arrest inducible genes. Oncogene 

Jan 7;lfl(l) :lB7-37 



10 



30 



35 



55 



Peptide information for frame 3 



15 ORF from 177 bp to Iblb bp; peptide length: HBU 
Category: strong similarity to known protein 
Classification: Cell division 

1 MIVADSECRA ELKDYLRFAP GGVGDSGPGE EdRESRARRG PRGPSAFIPV 

20 SI EEVLREGAES LE(3HLGLEAL MSSGRVDNLA VVMGLHPDYF TSFURLHYLL 

101 LHTDGPLASS LJRHYIAIMAA ARHfiCSYLVG SHMAEFLflTG GDPEWLLGLH 

1S1 RAPEKLRKLS EINKLLAHRP ULITKEHK2A LLKTGEHTUS LAELK3ALVL 

5D1 LTHCHSLSSF VFGCGILPEG DADGSPAPflA PTPPSEfiSSP PSRDPLNNSG 

ESI GFESARDVEA LMERHfiQLQE SLLRDEGTSfl EEMESRFELE KSESLLVTPS 

25 3D1 ADILEPSPHP DMLCFVEDPT FGYEDFTRRG A<2APPTFRA<2 DYTUEDHGYS 

351 LK2RLYPEGG (2LLDEKF(2AA YSLTYNTIAM HSGVDTSVLR RAIWNYIHCV 

4D1 FGIRYI>1>YI>Y GEVNflLLERN LKVYIKTVAC YPEKTTRRMY NLFURHFRHS 
i*Sl EKVHVNLLLL EARKflAALLY ALRAITRYMT 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy5_121mBi frame 3 



TREMBL : AF0331B0_1 gene: "PABb"; product: n p53 regulated PABb-TB 
nuclear 

40 protein"; Homo sapiens p53 regulated PABb-TB nuclear protein 
(PABb) 

mRWAi complete cds-i N = In Score = 1377-, P « T.7e-im 

TREMBL : AFQ331B5_1 gene: "PABb"; product: "non-p53 regulated PABb- 
45 Tl 

nuclear protein"; Homo sapiens non-p53 regulated PABb-Tl nuclear 
protein (PABb) mRNA-i complete cds--» N = li Score = 13b3n P = 3e- 
131 ■ 

50 TREMBL : AF0331B1_JL gene: "PABb"; product: "p53 regulated PABb-T3 
nuclear 

protein"; Homo sapiens p53 regulated PABb-T3 nuclear protein 
(PABb) 

mRNAn complete cds-n N = It Score = 1307-t P = B-5e-133 



>TREMBL:AF0331B0_1 gene: "PABb" ; product: "pS3 regulated PABb-TB 
nuclear 
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WO 01/98454 PCT/IBO 1/02050 

protein"; Homo sapiens pS3 regulated PA2b-T2 nuclear 
protein <PA2b) mRNAi 
complete cds - 

Length = 412 

HSPs: 

Score = 1377 (2Db-b bits)-. Expect = 1-7e-141i P = 1-?e-141 
Identities = 277/471 (SfiV.)-, Positives = 334/471 (7Q5C) 



fluery: 22 GVGDSGPGEEflRESRARRGPR GPSAFIPVEEVLREGAESLEflH- 

LGLEALI1SSGRV 7b 

G G 6 +<2 E R PR GPS FIP +E+L+ G+E + H L 

++ + GR+ 
15 Sb jet: 22 

GCK(2CGGGRDflDEELGIRIPRPLGflGPSRFIPEKEILl2VGSEDAflriHALFAI>SFAALGRL fil 

Ouery: 77 

DNLAVVnGLHPDYFTSFURLHYLLLHTDGPLASSURHYIAHIAAARHflCSYLVGSHHAEF 13b 
20 DN+ +VH HP Y SF + + LL DGPL +RHYI 

IMAAARHflCSYLV H+ +F 
Sbjct: 62 

DNITLVI1VFHP(JYLESFLKTl3HYLL(3(1I>GPLPLHYRHYIGII1AAARHi!ICSYLVNLH\/NDF 141 

25 fluery: 137 

LflTGGI>PEIilLLGLHRAPEKLRKLSEINKLLAHRPli)LITKEHI(3ALLKTGEHTIi)SLAELI(3 1Tb 
L GGDP+lilL GL AP+KL+ L E+NK+LAHRPULITKEHI+ LLK 

EH+USLAEL+ 
Sbjct: 142 

30 LHVGGI>PKULNGLENAPl3KLi2NLGELNKVLAHRPULITKEHIEGLLKAEEHSUSLAELVH 2D1 

fluery: 117 ALVLLTHCHSLSSFVFGCGILPEGD ADGXXXXXXXXXX 

XXXXXXXXRDPLNNS 241 

A+VLLTH HSL+SF FGCGI PE »G 

35 P+N++ 

Sbjct: 202 

AVVLLTHYHSLASFTFGCGISPEIHCDGGHTFRPPSVSNYCICDITNGNHSVDENPVNSA 2bl 

fluery: 250 GGF ESARDVEALMERH<2<2L<aESLLR»EG- 

40 TSflEEMESRFELEKSESLLVTPSADILE 30S 

+S +VEALME+N+QLOE R1>E SUEEM SRFE+EK ES+ V 

S+D E 

Sbjct: 2b2 ENVSVSDSFFEVEALMEKIIRtJLdEC-- 
RDEEEASflEEIIASRFEIEKRESIIFVF-SSDDEE 316 

45 

fluery: 3Db 

PSPHPDIILCFVEDPTFGYEDFTRRGAiJAPPTFRAflDYTIJEDHGYSLIdJRLYPEGGcJLLDE 3bS 
+P + ED ++GY+DF+R G P TFR flDY UEDHGYSL+ 

RLYP+ G<2L+DE 

50 Sbjct: 311 VTPARAVSRHFEDTSYGYKDFSRHGMHVP- 
TFRVflDYCUEDHGYSLVNRLYPDVGflLIDE 377 

fluery: 3bb 

KFQAAYSLTYNTIAMHSGVDTSVLRRAIUNYIHCVFGIRYDDYDYGEVNflLLERNLKVYI 425 
55 KF AY+LTYNT+AMH 

VDTS+LRRAHi)NYIHC+FGIRYDDYDYGE+N(2LL+R+ KVYI 
Sbjct: 37fl 

KFHIAYNLTYNT[1Af1HtCDVI>tSI1LRRAIIi)NYIHCnFGIRYI>DYDYGEINt2LLDRSFKVYI 437 
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WO 01/98454 PCT/1B01/02050 
(Juery: HSb 

KT VACYPEKTTRRMYNLFIJRHFRHSEKVHVNLLLLEARnCAALLYALRAITRYnT MAD 
KTV C PEK T+RI1Y+ FUR F + HSEKVHVNLLL+EARf1£A 

5 LLYALRAITRYMT 
Sbjct: M3fl 

KTVVCTPEKVTKR(1YDSFliJR(3FKHSEKVHVNLLLIEARn(3AELLYALRAITRYnT ^ c ^2 
10 Pedant information for DKFZphamyE_121mE-i frame 3 

Report for DKFZphamyE_ 121mE . 3 

15 

IELENGTH3J MflD 
EMIO SMM^3-^5 
CpI3 5-57 

EHOMOLJ TREMBL: AF0331E0_1 gene: "PA2b n ; product: n p53 

20 regulated PA5L-T5 nuclear protein"; Homo sapiens pS3 regulated 
PA2b-TE nuclear protein (PA2b) mRNAn complete cds- le-151 
EBLOCKSl PRDODMTD 
EKIO All_Alpha 
EKIO L0U_C0MPLEXITY 3-75 

25 

SE(2 MIVADSECRAELKDYLRFAPGGVGDSGPGEEfiRESRARRGPRGPSAFIPVEEVLREGAES 

SEG 

PRD cccchhhhhhhhhhhhhccccccccccccchhhhhhhccccccccccccchhhhhhhhhh 

30 

SEQ LE(3HLGLEALI1SSGRVI>NLA VVriGLHPDYFTSFWRLHYLLLHTDGPLASSURHYIAinAA 

SEG 

PRD hhhhhhhhhhhhhccccccceeeeccccchhhhhhhhhhhhhcccccchhhhhh 

35 SE<2 ARH(2CSYLVGSHnAEFL(2TGGI>PEWLLGLHRAPEKLRKLSEINKLLAHRPULITKEHI(3A 

SEG 

PRD hhhhhheeeccccceeeecccccccccccccchhhhhhhhhhhhhhhhccceeehhhhhh 

SE<3 LLKTGEHTUSLAELI(2ALVLLTHCHSLSSFVFGCGILPEGPADGSPAPfiAPTPPSE(3SSP 

40 SEG • - xxxxxxxxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccc 

SE<3 PSRDPLNNSGGFESARDVEALMERnt3(3Lt2ESLLRDEGTSC?EEnESRFELEICSESLLVTPS 

SEG xx 

45 PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhh 

SEfl ADILEPSPHPDMLCFVEDPTFGYEDFTRRGA(3APPTFRA(2DYTb)EDHGYSLI(2RLYPEGG 

SEG 

PRD ccccccccccceeeeccccccccccccccccccccceeeeeeccccccceeeeecccccc 

50 

SE(2 (3LLDEKF(2AAYSLTYNTIAnHSGVDTSVLRRAIWNYIHCVFGIRYDDYDYGEVN(2LLERN 

SEG - 

PRD hhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhhhccccccccccchhhhh 

55 SEG LKVYIKTVACYPEKTTRRnYNLFLjRHFRHSEKVHVNLLLLEARM(2AALLYALRAITRYnT 

SEG • 

PRD hheeeeeeeecccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhh 
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(No Prosite data available for DKFZphamy2_151mE - 3 ) 
(No Pfam data available for DKFZphamy2_121m2 . 3) 
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DKFZphamy2_121ol7 



PCT/IB01/02050 



5 group: transmembrane protein 

DKFZphamy2_121ol7 encodes a novel 21B amino acid protein without 
similarity to known proteins. 

10 The novel protein contains 1 transmembrane region- 
No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motif e • 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells- 



20 



unknown protein 
Pedant: TRANSMEMBRANE 1 
Sequenced by DKFZ 
25 Locus: /map= n lflb-b cR from top of Chr22 linkage group" 
Insert length: 2bT0 bp 

Poly A stretch at pos- 2bL>li polyadenylation signal at pos- 2b34 

30 

1 TGCTGGGAAA AGTGACTGCG ATTCTG AAGA ACCGCTGCCT TGCAAGGTCA 
51 AGGACATTCA GTGGTTGCTG GGGTCCGCAG ACTACTGCCA CCCACTCACC 
101 ATCAACTCTG TTAGCCCAAT TGCCCTGCTG AACAACTGCC TGAATACAGG 
151 CTTTAG6TTC CCCTGGACTC CAGCCAAGGC TGTTCAG6T6 GGACCATGGT 
35 201 GCTCTTTA AG CGTGATCGGA GGGAAGACAC ACAGCAGGGC CACCATTCCA 

251 TGAATGGGAG GTGT ACAGAT CACTTTCTCT TTGTGCTCAG TTCTCTTCTG 
301 TCTCCAGCAG CTATATTGGT A AGACTAGTA CCTGCCAGGG AGAGGTGCCC 
351 CCAAGTGAAG GGGTACAGTG GCACCTGGGA AAAGGCACCT GGAAGGTTTC 
401 CATGTGGCCC AGCCCAGCAT GGAAGCAGGG TGGGAACTCT GCT6TGTCGC 
40 451 CAGCCCTCAC TCTACTCA AG TGGCTTTTTG AGAGCCCTGC CATGTCTGTG 

501 TCAGGCCTGT GCTGCTTCAC ACCCTACAGC TGCCTGGGAA AGGCCGGCCA 
551 CGCTCCCTGT CCACACACTC CCTGTCCACA CACTCCCTGT CCACAACTGC 
bOl AGCCGGGCCC TCTGCCTATG GGCACCCAAT CCAAGCAGCT GCTCCACCTT 
b51 TGTTTGGCAT GGTGATTTGT GTTTTTTCTC TTGGTGCTTA TGTGTGTGGG 
45 701 CTTGGGACGA GTGCTGGTAT GCACTTAGGA CCTTCTTGAT AGCTCCCTGC 

751 ACTTTGGAAC ACGGAGCAGA TGAGAGAGGG TCAGGGGCTT GCCCTCCACC 
601 TTGGACTTGG AAGAAGCCCA CATTGGAGAG GTGAGGACCC CATGGTGGCT 
SSI CTAGTGGAAG ATACGTTAGT CTCCAGCTAA GGAGGATGAG GCGCAGCCCC 
101 AGAGGGAGAC CTCAGTGATA GGGGATCAGG CTACGAAAGT GGGGGkkGGG 
50 151 AGATGCTTTG TACATATTTT GGGGTTATAA TTTCTCTAAA TTTTAGGAGA 

1001 ACGGGTATTG ATTGATAAAA GGGACAGGCA GTAGTGTTCA ACAGTGCATG 
1051 TGAAGGAAAG TTCTGTTTTC CATGGTTTTG ACATTCTTTG GACTGTATTG 
1101 TGACTGCTGT CTGGTCCACA TGGTACCCTT TTGGTAAGTA GGCTTCAGTG 
1151 CATACCAGGG TATCACTGGA GATGGGAGTT AGTGAAGGGG TGACTCCCTG 
55 1201 GCCTAGTATA GTGTGACCCT GGGACAACTT AATGTCCTAA AGCATTTTGG 
1251 TGACTTCTAG GGAATAGCAA AGACCTATTT CATTGTCCCC AG6TAAGTAT 
1301 GTGATGAGCA ATGAGGAGGA GTGGAA AACA AAACCCAGA A AGTGC6GCAG 
1351 GACCAGCCTG ACGCACACGC TCCTGTTGTC ATGGCAGACA GCCGCCTTGG 
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PCT/IBO 1/02050 



1MD1 GTGGGCACCA 
1M51 CTGAGCTGGG 
15D1 GACAAA AGGG 
1551 GAAGAGGGAA 
5 IbOl TAAGGCTCAT 
lbSl AGGTAGGCAC 
17D1 TGGGCCAACA 
1751 GGCCTTCCAC 
IflDl TGAGTTTGGG 

10 1S51 GAGGAA ACCC 
HOI AGCACC AAAC 
ITSl ATAAATTTTT 
E001 TGTCTCTCAG 
2051 ACATAGTTTT 

15 E1D1 TTCACTTAGC 
E151 TTTTGGAGTG 
ESDI GTTGCC AATC 
EE51 TATCTTTGTG 
E3D1 CAGGAGGTGG 

20 E351 ACTACATATT 
EM01 GCCACCAGCA 
EM51 TCTGGCTAAA 
E501 AGTTGGCATT 
E551 CGATTGTATT 

25 EtDl TGGGGTCAGA 
Eb51 TGGTGTGTGT 



CCCTGGCAGT 
CATGTGCTGA 
CCCATGCTGC 
GAGATCCTAA 
GGCACCTCTG 
AAGGTGGGTC 
GGAGAAGCTC 
GTGGAACAGA 
TGGGGTGGCA 
AGCTCCTCAC 
CTACTAGGTC 
GCAAGTAGAG 
GCTTTCCCCC 
GTTGCTGTCT 
ATATCGTGGG 
GCTGTGTACT 
ACT AATGCTG 
CACAGGCTGT 
GATTCTTCAA 
GACAAAATGG 
AGGATAAACA 
TATTTGCTAA 
TCACTGACTT 
TCCTCCTTTG 
TAAATTTGTA 
CAAAAAAAAA 



TCCAGCCTGT 
GGTTGACTT A 
AGCCACTGAC 
TGGAGGCGCC 
CTTGGAAAGC 
TCCTGGGTAA 
CCCAGAGTAG 
CAGCCCCTGT 
GTTGGCACAG 
TTCCGTGTGC 
TTCTCATTAC 
GAAAGAAGGA 
CCAACTATGG 
TCTGTAATGA 
CATCTCCCCT 
CTCCCATTGA 
TTACTAACTT 
TTCCCAATGT 
CTAAAGAATA 
TTTCCGGAAA 
TGTCCATCTT 
TTTGATAATG 
CTAGCACGGT 
TGGATTGTCA 
TGAGCTCGGT 
AAAAAAAAAA 



AGGGGAGTGA 
GGGAACAAGC 
TGGGGGCAGA 
TCCATCTGCA 
ACTGGTTTAG 
GGGAAGCAAG 
GGGAGAAGGT 
GTCTCTGTCT 
CGCAGATGCG 
CTCATGCCTT 
CCATGTA AAC 
AATAAAACAT 
TTTCTTTGCT 
TACAGTTTTG 
TATGATTACT 
CTAGATGGAC 
TTCAGTTATA 
CAAGTTATTA 
TGAAAACCTT 
TATTTGTATC 
GCCCGTATTG 
AAAAAATAGC 
TGAACATCTT 
GTGTCCTTTG 
ATATATTAAA 
AAAAAAAAAA 



AGGG ACATGG 
CCTGGGATTG 
GCTCTGGGTG 
ACCACAGTTG 
GGACTTAGAG 
AGCAGACTGT 
TGGGGTGTAG 
CTTGGGGACC 
GTAGAGATGG 
TGCATACACA 
CACAf GTTAG 
CACATTTTGG 
TTTTGTTTT A 
TGCAGCTGTT 
AAATATTTTA 
CATTGTGCCA 
AATTGATG A A 
GGGTAGACTC 
TGAGGCTTTT 
CCCTTACACT 
GGAATTATCA 
ATCGTGTTTC 
TCATGTGGAG 
CTCTATCTTC 
GATATTAACC 
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BLAST Results 



Entry HS1D33E15 from database EMBL: 

Human DNA sequence from clone 1D33E15 on chromosome EEql3 - 1-13 - 2 . 
Contains part of a novel gene-» ESTs and a GSS. 
35 Score = 5^1^-, P = 5.1e-Eb2i identities = 1167/11=15 

Entry HSN12A A1E from database EMBL : 

Human DNA sequence from cosmid N1ESA1E on chromosome E2ql2-qter 
contains ESTs-i CpG island- 
40 Score * 5036-, P « 0- Oe + OD-i identities = lDm/lOlS 

Entry HSb^COMb from database EMBL : 
human STS UI-mD3M. 
Score = IflOO-, P = 1.4e-7b-, identities = 3^2/M17 

45 



50 



Medl ine entries 



No Medline entry 



55 Peptide information for frame 1 

ORF from bp to fi31 bpn peptide length: E1E 
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WO 01/98454 PCT/IB01/02050 

Category: putative protein 
Classification: no clue 

1 MVLFKRDRRE DTGflGHHSUN GRCTDHFLFV LSSLLSPAAI LVRLVPARER 
5 SI CPCVKGYSGT UEKAPGRFPC GPAdHGSRVG TLLCRCPSLY SSGFLRALPC 

101 LCflACAASHP TAAUERPATL PVHTLPVHTL PVHNCSRALC LUAPNPSSCS 
1J51 TFVUHGDLCF FSUCLCVWAU DECWYALRTF LIAPCTLEHG ADERGSGACP 
SOI PPUTUKKPTL ER 

10 

BLASTP hits 

No BLASTP hits available 

15 

Alert BLASTP hits for »KFZphamyE_lSlol7 -, frame 1 
No Alert BLASTP hits found 
20 Pedant information for DKFZphamyE_121ol7i frame 1 



25 



30 



35 



Report for DKFZphamyB_121ol7 . 1 

ELENGTHJ 212 

EMU J 23727. 5S 

EpIJ fl- 73 

DZKliO TRANSMEMBRANE 1 

SE<2 nVLFKRDRREDT<2<?GHHSnNGRCTDHFLFVLSSLLSPAAILVRLVPARERCP<2VKGYSGT 
PRD ccchhhhhcccccccccccccccccchhhhhhhhccccceeeeecccccccccccccccc 

mem nnnnniinnnnnrinminii 

SE£2 UEKAPGRFPCGPAUHGSRVGTLLCRflPSLYSSGFLRALPCLCflACAASHPTAAUERPATL 
PRD ccccccccccccccccccceeeeeccccccccccccccccchhhhhhccccccccccccc 
nzn 



40 SE£3 PVHTLPVHTLPVHNCSRALCLWAPNPSSCSTFVtdHGDLCFFSIilCLCVWAIilDECIilYALRTF 
PRD ccccccccccccccccceeeeecccccccceeeecccceeecccceeeeccchhhhhhhe 



nzn 



SE<2 LIAPCTLEHGADERGSGACPPPWTUKKPTLER 
45 PRD eeeccccccccccccccccccccccccccccc 
HEM 

(No Prosite data available for DKFZphamy2_121ol7 • 1 > 

50 

(No Pfam data available for DKFZphamy2_121ol7-l) 



-78- 



WO 01/98454 
DKFZphamy2_12d7 



PCT/IB01/02050 



5 group: signal transduction 



10 



25 



35 



DKFZphamy2_12d7 encodes a novel 552 amino acid protein-, which is 
a so far unknown alternative spliced form of disks large homolog 
DLG2- 



It seems to be predominantly expressed in the retina-, germ cells 
and brain- It contains a SH3-domain and a guanylate kinase 
domain. These conserved regions are shared among members of the 
discs-large family of proteins that include human p55-, a membrane 

15 protein expressed in erythrocytes-, rat PSD-TS/SAPTD-, a synapse 

protein expressed in brain-i Drosophila dlg-A-, a septate junction 
protein expressed in various epithelia-, and human and mouse Z0-1 
and canine ZO-E-i two tight junction proteins- The Homologue of 
Drosophila-, dIg-A-» acts as a tumor suppressor- All members of 

20 this family may be involved in signal transduction. 

The new protein can find application in modulating/blocking 
intracellular signal transduction pathways- 



similarity to disks large homolog DLG2 (Homo sapiens) 



alternative splicing : see DLG2 
complete cds - 

30 frame shift: around position m37 one C too many 



Sequenced by EMBL 

Locus: /map= n 33fl.fc, cR from top of Chrl7 linkage group" 
Insert length: 422D bp 

Poly A stretch at pos- 4160-, polyadenylation signal at pos- mbS 



40 1 CCCGGCTGCG CTGGAGCCGC CCGGAGCTAG GGGCTTCCCG GGGCGCAGGA 

51 GAGACGTTTC AGAGCCCTTG CCTCCTTCAC CATGCCGGTT GCCGCCACCA 

1D1 ACTCTGAAAC TGCCATGCAG CAAGTCCTGG ACAACTTGGG ATCCCTCCCC 

151 AGTGCCACGG GGGCTGCAGA GCTGGACCTG ATCTTCCTTC GAGGCATTAT 

201 GGAAAGTCCC ATAGTAAGAT CCCTGGCCAA GGCCCATGAG AGGCTGGAGG 

45 251 AGACGAAGCT GGAGGCCGTG AGAGACAACA ACCTGGAGCT GGTGCAGGAG 

301 ATCCTGCGGG ACCTGGCGCA GCTGGCTGAG CAGAGCAGCA CAGCCGCCGA 

351 GCTGGCCCAC ATCCTCCAGG AGCCCCACTT CCAGTCCCTC CTGGAGACGC 

M01 ACGACTCTGT GGCCTCAAAG ACCTATGAGA CACCACCCCC CAGCCCTGGC 

**51 CTGGACCCTA CGTTCAGCAA CCAGCCTGTA CCTCCCGATG CTGTGCGCAT 

50 5D1 GGTGGGCATC CGCA AGACAG CCGGAGAACA TCTGGGTGTA ACGTTCCGCG 

551 TGGAGGGCGG CGAGCTGGTG ATCGCGCGCA TTCTGCATGG GGGCATGGTG 

b01 GCTCAGCAAG GCCTGCTGCA TGTGGGTGAC ATCATCA AGG AGGTGAACGG 

b51 GCAGCCAGTG GGCAGTGACC CCCGCGCACT GCAGGAGCTC CTGCGCAATG 

701 CCAGTGGCAG TGTCATCCTC AAGATCCTGC CCAGCTACCA GGAGCCCCAT 

55 751 CTGCCCCGCC AGGTATTTGT GAAATGTCAC TTTGACTATG ACCCGGCCCG 

601 AGACAGCCTC ATCCCCTGCA AGGAAGCAGG CCTGCGCTTC AACGCCGGGG 

651 ACTTGCTCCA GATCGTAAAC CAGGATGATG CCAACTGGTG GCAGGCATGC 

^Dl CATGTCGAAG GGGGCAGTGC TGGGCTCATT CCCAGCCAGC TGCTGGAGGA 
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151 GAAGCGGA AA GCATTTGTCA 
1001 GGACCCTAT6 CGGCAGCCTT 
1DS1 TTGACCACCA A6AATGCAGA 
11D1 GGAGGTGGCC CGCATGCCCC 
5 1151 GGGCTCAGGG CGTGGGACGG 
1BD1 GATCCAGATC GCTATGGCAC 
1ES1 AGACTCAGAG CGGGAAGGTC 
13D1 TGGAGGCTGA CGTCCGTGCT 
13S1 GGCAACCTGT ATGGCACACG 
10 1401 TGGGAAGGTG TGCGTGCTGG 
1HS1 CGAACGGCCG AGTTTGTCCC 
1SD1 CGAGACCCTG CGGGCCATGA 
1551 CCAAGCAGCT CACGGAGGCG 
lt.01 CGCATCCAGC GGGGCTACGG 
15 lb51 CAACCTGGAG AGGACCTTCC 
1701 GGACAGAGCC CCAGT6GGTG 
1751 CCTGGTCCTT GGCTCACTCT 
1601 CCTCCTGACC TGTGACCCCC 
IflSl GTCCTTGGGT AACAGCTCCC 
20 1101 GGCGTGCACT GCCAGGGAGG 
1151 TGCTGCCCAC TCCTGATGCC 
SD01 GCTATGCCCA GGAATGTGTC 
2051 AAGAGAAAAG CTGCTTTGGG 
2101 CCACCCCTCC CCAGTCACCA 
25 2151 ATTCCTGGAC TCCTCCCACC 
2201 GGGCTGTTTC CGTGTGACCA 
2251 AGGCCCGGGT GGTGGTGCCA 
2301 CAGAGTGAGA GCCAGTG6CC 
2351 CTTTGGAAAG GGACAGGGTC 
30 21101 ATCCACAGCT TCTCACTGCC 
2451 TGACAGGTCA GCCCTGCTCC 
2501 GCTCAGCCCA GGTAGGGGCA 
2551 CTGTTACCAA CTGAAGAGCC 
2b01 GGTCTGAGCT CTATGTCCTT 
35 2bSl CAGGTCCAGG TAGCCCACTT 
2701 GGAGGAGTGC AGAGGGGACC 
2751 CTCCAGGAGG TTCCTCACAC 
SflOl TCTGTACAAC CTGTGGTTCC 
2SS1 TGGCACCAGG TTGTGTGTGT 
40 2101 GTGTCAGGTT TAGTTTGGGG 
2151 CTCTTTGGGG CCCCTTTCTG 
3001 ATACCCTGAT CCCAGACTCC 
3051 CCTTGTCTTA TTGTCCCCCT 
3101 GAGGGCAGTT TTGTAAAATA 
45 3151 TAGATGTACT TGGGCATCTC 
3201 GGGGAGCCTG TCCTCAGAGG 
3251 CTTGTGCCTC CCAGTTCTTC 
3301 CAGCCAAAGC TGGCCCCTGA 
3351 GAAGGAGAAT AAACAGAATA 
50 3401 TACTAGGAAT CTCATTTGCA 
3451 GGCCAGGCCT GCCCCCATCT 
3501 TCATTCTCCT GCTCCTCTTT 
3551 CAGCTCTGCC TTGCATCACC 
3b01 GTGTTTTCTT GCTGACCTGA 
55 3b51 TCTCTGGGGG ATTTGAGGGT 
3701 GCTTCTGCCC CCTTGATCCA 
3751 GGGTACCTGG CCTTCATGGC 
3301 GGCCTGGGGC CCTTCCCTCC 



AGA6GGACCT GGAGCTGACA CCAAACTCAG 
TCAGGAAAGA AAAAGAAGCG AATGATGTAT 
GTTTGACCGT CATGAGCTGC TCATTTATGA 
CGTTCCGCCG GAAAACCCTG GTACTGATTG 
CGCAGCCTGA AGAACAAGCT CATCATGTGG 
CACGGTGCCC TACACCTCCC GGCGGCCGAA 
AGGGTTACAG CTTTGTGTCC CGTGGGGAGA 
GGGCGCTACC TGGAGCATGG CGAATACGAG 
TATTGACTCC ATCCGGGGCG TGGTCGCTGC 
ATGTCAACCC CCAGGCCGGT GAAGGTGCTA 
TTACGTGGTG TTCATCGAGG CCCCAGACTT 
ACAGGGCTGC GCTGGAGAGT GGAATATCCA 
GACCTGAGAC GGACAGTGGA 6GAGAGCAGC 
GCACTACTTT GACCTCTGCC TGGTCAATAG 
GC6AGCTCCA GACAGCCATG GAGA AGCTAC 
CCTGTCAGCT GG6TGTACTG AGCCTGTTCA 
GTGTTGA AAC CCAGAACCT6 AATCCATCCC 
TGCCACA ATC CTTAGCCCCC ATATCTGGCT 
AGCAGGCCCT AAGTCTGGCT TCAGCACAGA 
TGGGCATTCA TGGGGTACCT TGTGCCCAGfi 
CATTGGTCAC CAGATATCTC TGAGGGCCAA 
AGAGTCACCT CCATAATGGT CAGTACAGAG 
ACCACATGGT CAGTAGGCAC ACTGCCCCTG 
6TTCTCCTCT GGACTGGCCA CACCCACCCC 
TCTCACCCCT GTGTCGGAGG AACAGGCCTT 
GGGGAATGTG TGGCCCGCTG GCAGCCAGGC 
GCCTGGTGCC ATCTTGAAGG CTGGAGGAGT 
ACAGCTGCAG AGCACTGCAG CTCCCAGCTC 
GCAGGGCA6A TGCTGCTCGG TCCTTCCCTC 
GAAGTTTCTC CAGATTTCTC CAATGTGTCC 
CGACAGGGCC AGGCT6GCAG GGGCCATTGG 
GGATGGAGGG CTGAGCCCTG TGACAACCTG 
CCAAGCTCTC CATGGCCCAC AGCAGGCACA 
GACCTTGGTC CATTTGGTTT TCTGTCTAGC 
GCATCAGGGC TGCTGGGTTG GAGGGGCTAA 
TTGGGAGCCT GGGCTTGAAG GACAGTTGCC 
ACAACTCCAG AGGCGCCATT TACACTGTAG 
ACGTGCATGT TCGGCACCTG TCTGTGCCTC 
GTGCGTGTGC ACGTGCGTGT GTGTGTGTGT 
AGGAAGCAAA GGGTT TTGTT TTGGAG6TCA 
GGGGTTCCCC ATCAGCCCTC ATTTCTTATA 
A AAGCCCTGG TCCTTTCCTG ATGTCTCCTC 
ACCCTAAATG CCCCCCTGCC ATAACTTGGG 
GGAGACTCCC TTTAAGAAAG AATGCTGTCC 
ATCCTTCATT ATTCTCTGCA TTCCTTCCGG 
GGACAACCTG TGACACCCTG AGTCCAAACC 
CAAGTGTCTA ACTAGTCTTC GCTGCAGCGT 
ACCACTGTGT GCCCATTTCC T AGGGAAGGG 
TTTATTACA A ATGTTAGAAT ATATTTCTTA 
TTTGCATAGA CTATACACAT GGGGTGGAAA 
CGTTGGTGTG GCTCTGCGTA TACTACACAC 
TCCCTTAGTC AGTGTCCTTT CATCCTGATT 
CTCAGCCTAA GGGAGTGGGA AGGAAATGGG 
GGCTATAGGG TCACTTGCCA TTTCCTACCT 
AGAGGCAGGG GAAGATCTGT TGTTGCAGTT 
AATGACCATC ATCTCTGATG GAGATGGGTT 
ACCTTCACTG CTAGGGATGC TCAAGGGGCA 
TGTCTCTTCT CGGTCTTTCC TCTCTGAGCA 
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WO 01/98454 

3651 GCCTCCTACC 

3^D1 CTAGC AGAGG 

3151 GTTGGAACAG 

4001 CCCTGCCATT 

4051 CCTACCATCC 

M101 TTCCCTCCCT 

4151 TTTCTTTTTC 

42D1 AAAAAAAAAA 



TCCCCTGCCT 
CTGTCAGTCC 
GTGTGTGCCC 
GTGGACTCTT 
ACTGTCCATA 
CAGCCATCCA 
CTTGGATTAA 
AAAAAAAAAA 



GAGCCCTCAC 
TTGGCTCACC 
CCACCACA6C 
GTATTTGAGG 
TTCAGTCCCA 
ATTCTTGAGT 
ATGTGAAAGC 



TCCACAGCCC 
T6GAACAGGG 
TCTATGACTC 
GACCTCAAGA 
GCCCCAGTGC 
TTTCTCACTG 
AAAGAAAAAA 



PCT/IB01/02050 

TCCCAGGTAC 
CTGGGGCTGG 
TGTTCTCCCT 
GAGTGAGGAC 
GCTTCCTCTG 
ATTGGTTTTC 
AAAAAAAAAA 



10 



15 



No BLAST result 



BLAST Results 



Medline entries 



20 1bO?D42fi: 

Mazoyer S-. Gayther SA-i Nagai MA-i Smith SAi Dunning At van 
Rensburg EJt 
Albiertsen H-, White Ri 

Ponder BA.i A gene (DLG2) located at 17ql2-q21 encodes a new 
25 homologue 
of 

the Drosophila turaor suppressor dlg-A- Genomics ms Jul 
H26C1) :25-31 



30 



Peptide information for frame 1 



35 

ORF from 62 bp to 1437 bpi peptide length: i»52 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: GUANYLATE_KINASE_1 (365-402) 

40 

1 MPVAATNSET AP1(2i3VLI>NLG 
51 AHERLEETKL EAVRDNNLEL 

101 (2SLLETHDSV ASKTYETPPP 
45 151 LGVTFRVE6G ELVIARILHG 

2D1 (2ELLRNASGS VILKILPSYfl 

251 LRFNAGDLL& IVNdDD ANUb) 

3D1 ELTPNSGTLC GSLSGKKKKR 

351 KTLVLIGA<2G VGRRSLKNKL 
50 401 FVSRGEHEAD VRAGRYLEHG 

451 <2A 



SLPSATGAAE 
VCEILRDLAC3 
SPGLDPTFSN 
GfWAfiflGLLH 
EPHLPRflVFV 
C2ACH VEGGSA 
W1YLTTKNAE 
IMIilDPDRYGT 
EYEGNLYGTR 



LDLIFLRGIM 
LAEdSSTAAE 
OPVPPDAVRM 
VGDIIKEVNG 
KCHFDYDP AR 
GLIPS<2LLEE 
FDRHELLIYE 
TVPYTSRRPK 
IDSIRGVVAA 



ESPIVRSLAK 
LAHILCEPHF 
VGIRKTA6EH 
CPVGSDPRAL 
DSLIPCKEAG 
KRKAFVKRDL 
EVARMPPFRR 
DSEREG<2GYS 
GKVCVLDVNP 



55 BLASTP hits 

No BLASTP hits available 
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5 



25 



30 



WO 01/98454 PCT/IB01/02050 
Alert BLASTP hits for DKFZphamyE_lSd7 n frame 1 

No Alert BLASTP hits found 

Peptide information for frame E 



ORF from bp to 173fl bp^ peptide length: 100 - 

10 Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: LEUCINE_ZIPPER Cbb-fi7) 

15 1 VKVLRTAEFV PYVVFIEAPD FETLRAPINRA ALESGISTKfl LTEADLRRTV 

51 EESSRK2R6Y GHYFDLCLVN SNLERTFREL ATAflEKLRTE PdUVPVSUVY 



20 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamyE_lEd7 ■» frame E 
No Alert BLASTP hits found 

Pedant information for DKFZphamyE_lEd7 n frame 1 



Report for * DKFZphamyE__ied7.il 



ELENGTHD Sib 

35 EMIO 5b45fl.3b 

EpIJ b-Sl 

CH0I10LJ PIR:A57b£3 disks large homolog DLGE - human 

EFUNCAO 03.. 03. 11 other nucleotide-metabol ism activities CS- 
cerevisiae-. YDRMSMcH 7e-15 

40 CFUNCATI f nucleotide metabolism and transport EH- influenzae! 
HI17M3J 3e-D7 

CBL0CKS3 PRD0a3i*F 

EBLOCKSJ BLOOfiSbC 

^BLOCKS! BLOOflSbB Guanylate kinase proteins 

45 EBLOCKSJ BLDOfiSbA Guanylate kinase proteins 

CSCOPH dlgky_ 3-E1-3i-l-l Guanylate kinase Cbaker's 

yeast (Saccharomyce fle-L|5 

ESCOPJ dlkwab_ E-Eb-LLE Cask/Lin-E EHuman (Homo 

sapiens) Me-34 

50 CEO E-7-M.fi Guanylate kinase fie-17 

EPIRKliU blocked amino end fie-1? 

CPIRKU3 phosphotransferase fie-17 

CPIRKUl monomer fie-17 

EPIRKUJ duplication 5e-E1 

55 EPIRKliU signal transduction 3e-EM 

EPIRKWJ alternative splicing 5e-E1 

EPIRKliU P-loop fle-17 

EPIRKtdJ acetylated amino end le-lb 
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WO 01/98454 PCT/1B01/02050 

IPIRKIO membrane protein 1e-74 

EPIRKU3 magnesium 6e-17 

EPIRKtO ATP fle-1? 

ESUPFAfU SH3 homology ^e-74 

5 ESUPFAfO discs-large tumor suppressor 3e-2M 

ISUPFAMJ unassigned Ser/Thr or Tyi — specific protein kinases Se- 
ll 

ISUPFAMU protein kinase homology 5e-ll 

ISUPFAM3 GLGF domain homology Te-7H 

10 ESUPFAPO guanylate kinase fle-17 

ISUPFAMJ guanylate kinase homology Te-? 1 ! 

EPR0SITE3 GUANYLATE_KINASE_1 1 

EPFAfO Src homology domain 3 

IKUH Irregular 

15 CKldJ 3D 



20 



25 



SE<2 MPVAATNSETAMcJflVLDNLGSLPSATGAAELDLIFLRGIMESPIVRSLAKAHERLEETKL 
Igky- 

SEfl EAVR»NNLELVflEILRDLAi2LAE«2SSTAAELAHILCEPHF«aSLLETHDSVASKTYETPPP 
lgky- 

SE(2 SPGLPPTFSN(3PVPPDAVRHVGIRKTAGEHLGVTFRVEGGELViARILHGGPIVA(2(3GLLH 
Igky- 



30 SE<2 VGDIIKEVNGOPVGSDPRALflELLRNASGSVILKILPSYflEPHLPRtfVFVKCHFDYDPAR 
lgky- 

SEfl DSLIPCKEAGLRFNAGDLL<2IVNQ])DANIdW(2ACHVEGGSAGLIPS(2LLEEKRKAFVKRDL 
35 Igky- 

SE<2 ELTPNSGTLCGSLSGICKKKRHMYLTTKNAEFDRHELLIYEEVARI1PPFRRKTLVLIGA(3G 
Igky- 

40 CCEEEECTTT 

SEfi VGRRSLKNKLIMlJDPDRYGTTVPYTSRRPKDSEREGfiGYSFVSRGEflEADVRAGRYLEHG 
Igky- 

TCHHHHHHHHHHHTTTTEEECCEEECCCCTTTTTTTTTTEECCHHHHHHHHHHCCEEEEE 

45 

SE(3 EYEGNLYGTRIDSIRGVVAAGKVCVLDVNPflAGEGATNGRVCPLRGVHRGPRLRDPAGHE 
Igky- 

EETTEEEEEEHHHHHHHHHHCCEEEEECCHH 

50 SE(2 (JGCAGEUNIH(2AAHGGGPETDSGGE(?PHPAGLRALL 

lgky- - 



55 Prosite for DKFZphamy5_lSd7-l 

PS0Dfl5fc> 385->MD3 GUANYLATE_KINASE_1 PD0C00b?D 
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PCT/IB01/02050 



Pfam for DKFZphamy2_12d7 . 1 



5 



25 



40 



50 



55 



HNN_J\IAriE Src homology domain 3 



*py VIALYDYqAqd pDELSFkEGDIIillEdsDD - UltdrgRnnn 

10 + V+ +DY + + + + L F GD ++I +++ + D+ WW + 

<3uery 22fl 

VFVKCHFDYDPARDSLIPCKEAGLRFNA6DLL(2IVN(31>l>ANUbI(3ACHVE 2?b 

HMM TNG<2EGUIPSNYVEPi* 
15 ++ G + IPS +E+ 

Guery 277 GG-SAGLIPS<2LLEEK 2^1 



20 Pedant information for DKFZphamy2_12d7n frame 2 

Report for DKFZphamy2_12d7 • 2 



ELENGTHID 175 
EMliO 117E1.10 



EpIJ T-b^ 

CHOnOLJ PIR:A57bS3 disks large homolog DLG2 - human ?e-53 

30 EPIRKIO membrane protein le-13 

ESUPFAMID SH3 homology le-13 

ESUPFAfU GLGF domain homology le-13 

CSUPFAMJ guanylate kinase homology le-13 

EPROSITEJ LEUCINE_ZIPPER 1 

35 EKUIB Alphajeta 



SE<3 MAPRCPTPPGGRKTt3SGKVRVTALCPVGRWRLTSVLGATlilSriANTRATCnAHVLTPSGAlJ 

PRD ccccccccccccccccceeeeeeeccccccceeeeeccccccchhhhhhhhhhccccccc 

SE<2 SLLGRCACUflSTPRPVKVLRTAEFVPYVVFIEAPDFETLRAIINRAALESGISTKflLTEAD 

PRD ccccceeeeecccchhhhhhhhhcceeeeeeeccchhhhhhhhhhhhhccccchhhhhhh 



SE<3 LRRTVEESSRIflRGYGHYFDLCLVNSNLERTFREL(2TAnEKLRTEPfiUVPVSblVY 
45 PRD hhhhhhhhhhhhhhhhhheeeeeecccchhhhhhhhhhhhhhhhccccccccccc 



Prosite for DKFZphamy2_12d? - 2 
PSDD02T lMl->lb3 LEUCINE_ZIPPER PD0CDDD2T 

(No Pfam data available for DKFZphamy2_12d7.2> 
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WO 01/98454 ^PCT/lBOl/02050 
DKFZphamy2_12g7 



5 group : amygdala derived 

DKFZphamy2_12g7 encodes a novel 25M amino acid protein without 
si mi larity to known proteins - 

10 No inf ormative BLAST results^ No predictive prosite-i pfam or SCOP 
motif e . 



15 



20 



25 



The new protein can find application in studying the expression 
profile of amygdala-specific genes- 
putative protein 
Sequenced by EMBL 
Locus: unknown 
Insert length: 1257 bp 

No poly A stretch foundi no polyadeny lation signal found 



1 CTCCAAGACT TCCTTGCTGT GAGGCTCGTG TGGACCCCAG AGCATGCACA 
51 GGCTGTTTAC TCCACAGAGT GGCTTTGAGA ATCAGATGAG ACTGTGCTGG 
101 CGAAGGCCCT GTGGGAATGA GGAACGCTGT AGTGTTTGCT GGTCCCTGTT 
30 151 TCTGCCCCCA GGAAAGCAGC TGTGTGAGGA GGAGCGCCGG GCCATGCAGG 

201 CTGCCCTGGA CTCCGTCGTC TGCCACACGC CCCTCAACAA CCTTGGCTTT 
251 TCCCGG AAGG GCAGCGCGCT CACCTTCAGT GTGGCCTTCC AGGCTCTGAG 
301 GACGGGGCTC TTCGAGCTAA GCCAGCACAT GAAACTGAAG CTGCAGTTCA 
351 CCGCCAGCGT GTCCCACCCT CCACCCGAGG CCCGGCCCCT CTCCCGCAAG 
35 HOI AGCAGCCCCA GAAGCCCTGC TGTCCGGGAC TTGGTGGAGA GGCATCAGGC 

M51 TAGCCTGGGC CGCTCCCAGT CCTTCTCCCA CCAGCAGCCT TCCCGAAGCC 
501 ACCTCATGAG GTCGGGCAGT GTGATGGAGC GCAGAGCATC ACGCCCCCTG 
551 TGGCCTCTCC TGTTGGCCGC CCCCTCTACC TGCCCCCGGA CAAGGCTGTG 
bOl TTGTCTCTGG ACAAGATTGC CAAGCGCGAG TGCA AGGTCC TGGTGGTGGA 
40 fe»51 ACCCGTCAAG TAGCACCGTG CCAGCTCTGT TCCCTCTTAC ACTCCAGAGA 

701 CCCAACGCCC CCAGAGGGTA TCCTTGCTCC CGGGCTGTGC CTCCCCTGGG 
751 ATGCCTCCCA GACGGGGGTG AAGAGGCCTG GCAGAGCTGC CTGTCTTGTG 
801 TCTGCTGATG AGGGATGGGG GAAGAAGCTG TGAAGTGGGC GGGCATGGCT 
B51 GGGACTAAGC CACCAGTATT CCCCGACGTT CCTGTGGGGG GGGCTGGCCC 
45 101 ACCCCTAGGC CAGGGCAAGG GTTCCCAGAG CTCCCTTGTC CCCGGCCCTT 

151 TACCCTGGTT CTGAGTTTAC AAAGTCTCTT CCTCATTCCC GTTGAGTTCT 
1001 TTCCCACCTC TGACATTCCC TCCCTCCCTC CCGCAGGCTG AGATTAGAGG 
1051 GTGGTG ATGG CTAAGGGCCC CTGACAGTGA CCTTCCTGTC TCAGGGGTTG 
1101 GGGACAGGGC CAGGTAGCCT CCTGCCCCTT ATGTTTACGT TTGCAGCCTG 
50 1151 AAGCACTTTA ATTTTTTTTT TTTTTGGTCT GTGCCTGTAA CTAATTTTCC 
1201 AACTATTGCT TCCAACTGAA ATAAGACTAT TAAATGCCTG TTCAGAGGGA 
1251 AAAAAAA 

55 BLAST Results 

No BLAST result 
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5 



35 



55 



WO 01/98454 PCT7IB01/02050 

Medline entries 

No Hedline entry 



10 Peptide information for frame 2 

ORF from MM bp to fiDS bp; peptide length: 25M 
Category: putative protein 
15 Classification: no clue 

1 MHRLFTPfiSG FENflflRLCWR RPCGNEERCS VCUSLFLPPG KdLCEEERRA 
51 IH3AALDSVVC HTPLNNLGFS RKGSALTFSV AFflALRTGLF ELSflHMKLKL 
101 C2FTASVSHPP PEARPLSRKS SPRSPAVRDL VERHC2ASLGR StfSFSHflflPS 
20 151 RSHLMRSGSV ilERRASRPLld PLLLAAPSTC PRTRLCCLQJT RLPSASARShl 

501 UWNPSSSTVP ALFPLTLtfRP NAPRGYPCSR AVPPLGCLPD GGEEAblflSCL 
ESI SCVC 

25 

BLASTP hits 

No BLASTP hits available 
30 Alert BLASTP hits for DKFZphamy2_12g7 i frame 2 

No Alert BLASTP hits found 

Pedant information for DKFZphamy2_12g7 n frame 2 



Report for DKFZphamy2_12g7 - 2 



40 CLENCTH3 25M 

EfllO 2fiM7T. c *l 

EpIJ ID -DO 

CBL0CKS3 BLD1D13C Oxysterol-binding protein family proteins 

CKU3 Alpha_Beta 

45 EKIO LOhLCONPLEXITY 4-72 B ^ 

SEG riHRLFTPl3SGFENt3riRLClJRRPCGNEERCSVCli)SLFLPPGKc3LCEEERRAM(2AALDS VVC 

SEG - • 

50 PRD ccccccccccccchhhhhhcccccccceeeeeeeeeccccccchhhhhhhhhhhhhheee 

SE(3 HTPLNNLGFSRKGSALTFSVAF(3ALRTGLFELS(3Hf1KLKL(2FTASVSHPPPEARPLSRKS 

SEG ■ • - xxxxxxx 

PRD cccccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 



SE(3 SPRSPAVRDL VERHfl AS LGRSflSFSHfldPSRSHLNRSGS VII ERR A SRPLUPLLL A APSTC 

SEG xxxxx 

PRD ccccchhhhhhhhhhhhcccccccccccccceeeecccchhhhhhccccccccccccccc 
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SEt2 PRTRLCCLUTRLPSASARSLJWLINPSSSTVPALFPLTLflRPNAPRGYPCSRAVPPLGCLPD 

SEG • 

PRD cccceeeeeccccccccceeeccccccccccccccccccccccccccccccccccccccc 

SE(2 GGEEAWflSCLSCVC 

SEG 

PRD cchhhhhhhhhccc 



(No Prosite data available for DKFZphamy2_12g7 -H) 
(No Pfam data available for DKFZphamy2_12g7 • 2 ) 
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WO 01/98454 
DKFZphamy2_12il 



PCT/1B01/02050 



group: amygdala derived 

DKFZphamy2_12il encodes a novel 5A3 amino acid protein with weak 
similarity to F41Eb-3 of Caenorhabditis elegans. 

No informative BLAST results^ No predictive prositei pfam or SCOP 
motife. 

protein can find application in studying the expression 
of amygdala-specific genes. 



putative protein 
Sequenced by EtlBL 
Locus: /map="3" 
Insert length: 252fl bp 

Poly A stretch at pos. 2515-. polyadenylation signal at pos- 2411 



1 ATATAGTTGG ATCAAACAAA AACAACACAA TTTGTCCCGA TAATTATCAA 
51 ACAGCACAGC TACTTGCCTT AATTTT AGAG TTACTCACAT TTTGTGTGGA 
101 ACATCACACA TATCACATAA AAAACTATAT TATGAACA AG GACTTGCTAA 
151 GAAGAGTCTT GGTCTTGATG AATTCAAAGC ACACTTTTCT GGCCTTGTGT 
201 GCCCTTCGCT TTATGAGGCG GATAATTGGA CTTAAAGATG AATTTTATAA 
251 TCGTTACATC ACCAAGGGA A ATCTTTTTGA GCCAGTTATA AATGCACTTC 
301 TGGATAATGG AACTCGGTAT AATCTGTTGA ATTCAGCTGT TATTGAGTTG 
351 TTTGAATTTA TAAGAGTGGA AGATATCAAG TCTCTTACTG CCCATATAGT 
401 TGAAAACTTT TATAAAGCAC TTGAATCGAT TGAATATGTT CAGACATTCA 
451 AAGGATTGAA GACTAAATAT GAGCAAGAAA AAGACAGACA AAATCAGAAA 
501 CTGAACAGTG TACCATCTAT ATTGCGTAGT AACAGATTTC GCAGAGATGC 
551 AAAAGCCTTG GAAGAGGATG AAGAAATGTG GTTTAATGAA GATGAAGAAG 
bOl AG6AAGGAAA AGCAGTTGTG GCACCAGTGG AAAAACCTAA GCCAGAAGAT 
fe>51 GATTTTCCAG ATAATTATGA AAAGTTTATG GAGACTAAAA AAGCAA AAGA 
7D1 AAGTGAAGAC AAGG-AAAACC TTCCCAAAAG GACATCTCCT GGTGGCTTCA 
751 AATTTACTTT CTCCCACTCT GCCAGTGCTG CTAATGGAAC AAACAGTAAA 
601 TCTGTAGTGG CTCAGATACC ACCAGCAACT TCTAATGGAT CCTCTTCCAA 
fiSl AACCACAAAC TTGCCTACGT CAGTAACAGC CACCAAGGGA AGTTTGGTTG 
TD1 GCTTAGTGGA TTATCCAGAT GATGAAGAGG AAGATGAAGA AGAAGAATCG 
■151 TCCCCCAGGA AAAGACCTCG TCTTGGCTCA TAAAATATTT ATTAGGGGAC 
1DD1 CCTCAACATG TGGTCTTACA ATGCTGCAAC TGTTCAGTGA GCTGAAAATC 
1051 TGAATCAGAA AGCTTTCTCA ATTGAACTTA TAAAATATAC AAGGAGTAGC 
1101 AAAAGACAGT ATATCAGCTA AGAGAGTTTA GTTCTAATAA AAATCAGGCT 
1151 TCCCAGGAAC TTGATTGCTT GCTAGTAATT AAGGGGTTTG CCTTTTAGGC 
1H01 TGTCAAAACA AACATTAGTA ACCAGAACCT GGGAGATAGC TTCTCAGCAA 
1251 GGAAAAGTCA CAGGTTTGGG GACGGTTTAG GGGAGGGGAA AAGGTTGATA 
1301 TAATAATGCA GGGTTGCTCC TCGGGGTGTC GATCTAGAAA CAATTTTACA 
1351 GAACTTCAGT TGTAAACTCA ATAACATTAC TTGTATAATG GTGCTGGCCA 
1401 TGTTGTTGTT TTAATCAGTT GCCTCTTTTT AAAAGAAATT TTTATGGAAA 
1451 ACACATTCAA CTATCATTAA AAAA ATGAAG TTAAGCTGTT GGGACCATTT 
1501 CTTTAAGATT TAACAAA AGT TCAGCCTTTT AGGTAGTTGA AGGGAAGTAC 
1551 ACCCCGTATT CAGCACATGT TGAGTTTTCT ACACCAGGAA TTTTCAATAT 
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10 



15 



20 



IbDl GTATATTGAT 
lb51 GTTA A AATCA 
1701 GTATTCCAAA 
1751 GTTGTCAACC 
IflOl ACTTTCTACT 
1&51 AAGTGTTACT 
nDl TTAGTATCTA 
nSl CAGCATGCAG 
2001 CTGCTTTTTT 
2051 TAAAAATAAA 
SI 01 TAACTGTAAA 
2151 GCCGAAGTGA 
2201 AAAAACTTTA 
2251 GTGTTACGGG 
2301 CTGTTTTGGA 
2351 AGATGAGCTG 
2M01 TGTTTATTTT 
2M51 ATACATGTGG 
2501 TGACCCTGTA 



GAAAACAAGC 
GCACTTTTAG 
ATGATTTTCT 
GCCTACCAGT 
TTGTTAGTAA 
TATACAGACC 
GAAGCCATTT 
CACATGTAGC 
AAGAAGAGTA 
TTGAGGCAAA 
TATTTTAATG 
GAAAAATCTC 
ACAGTAATTC 
TCCATTTTTC 
GATTTTGTAG 
TCTGTGTAGA 
GTATTAAAAA 
TGATATTAAT 
ATTCCAAAAA 



TCAATTCAAA 
AGACAACGAA 
CTAGAAATTT 
ACAATCTTTT 
AGTTCTGTCT 
A ACCAAGAAT 
TGATCCAAGA 
TTTTCTGTAA 
AAAGCACATT 
CAGTTAAGTT 
TTAGTTTGCT 
CCCA AAATAC 
CAGCCACAAT 
CTGGAATCGT 
TTAATTTTAA 
TATGAAGTAT 
ATACCACTGT 
GCTAAACTGT 
AAAAAAAA 



CTGGACAGTT 
GGCCAAGAAT 
GAAAGTAGAT 
GTGGAAGATA 
TTCCAGAGCT 
AGTGCTGAAT 
AGCTACTTAA 
ACAAGGGTGT 
CCATATACGT 
TTATTTTTAG 
CATCTATGAT 
AATTTAATGC 
CTTTAGATCA 
TTAATCTAAA 
TTTTGGCTAT 
AGTTTTTTCC 
ACTTGTTTTA 
AAAATTCAGG 



TTAAGATAAT 
CAGTACAGTA 
CGAACAGAAT 
CTTTGAAATC 
GCAAGTTTTA 
TAAGTGGCAT 
GTGTCAAAGT 
GATATGAAAG 
AAGTGAATTT 
AGCAACAAGT 
CTGAGATCAT 
ATTGGGAAAA 
CCCTTGTA AT 
GCAGTTTCCC 
TGTTTGGAAA 
ATAAAACAGA 
CACCATTTGT 
AATTAAAATG 



BLAST Results 



25 Entry AF01bL*L*fl__fi from database TREHBL : 

gene: "FmEfc,.3"; Caenorhabdi t is elegans cosmid FMlEb- 
Score = BTOt P = 5.0e-32i identities = 73/lflM-i positives = 
llfl/lflM-. 
frame +3 

30 

Entry HS21125b from database EMBL : 
human STS SHGC-lSfiMM- 
Score = ^77-. P = 5.5e-35-. identities = 1^/202 

35 

Medline entries 



40 No Medline entry 



Peptide information for frame 3 
45 

ORF from 132 bp to TfiQ bp} peptide length: 2fi3 
Category: putative protein 
Classification: no clue 

50 

1 MNKDLLRRVL VLMNSKHTFL ALCALRFMRR IIGLKDEFYN RYITKGNLFE 

51 PVINALLDNG TRYNLLNSAV IELFEFIRVE DIKSLT AHIV ENFYKALESI 

101 EYVC3TFKGLK TKYEC3EKDRC3 NtfKLNSVPSI LRSNRFRRDA KALEEDEEMW 

151 FNEDEEEEGK AVVAPVEKPK PEDDFPDNYE KFMETKKAKE SEDKENLPKR 

55 201 TSPGGFKFTF SHSASAANGT NSKSVVAdlP PATSNGSSSK TTNLPTSVTA 

251 TKGSLVGLVD YPDDEEEDEE EESSPRKRPR LGS 
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WO 01/98454 PCT/IB01/02050 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_12il -, frame 3 
No Alert BLASTP hits found 
10 Pedant information for DKFZphamy2_12il n frame 3 



5 



15 



Report for DKFZphamy2_li2il - 3 



ELENGTHJ 32b 
miO 372L.1-1D 



(EpIJ S.t.0 

EH0M0L3 TREI1BL:AFDlbMMa_fi gene: "F41Eb.3"i Caenorhabdi tis 

20 elegans cosmid FMlEb- le-3t> 

CFUNCAT3 01-05. QM regulation of carbohydrate utilization ES. 

cerevisiaei YNL201cJ 2e-0fl 

EBLOCKS J BLDD35? Histone H2B proteins 

EBLOCKSJ BPD2232B 

25 CBLOCKSJ PRD1D73C 

CBLOCKSJ BP03D5DC 

CBL0CKS3 BP035ADF 

CBLOCKSJ PRODB^F 

EKbU All_Alpha 

30 EKIO L0lif_C0f1PLEXITY 1D-M3 * 

SE<2 IVGSNKNNTICPDNY(2TAt3LLALILELLTFCVEHHTYHIKNYinNKDLLRRVLVLnNSK:H 

SEG - xxxxxxxxx • 

35 PRD cccccccccccccchhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccch 

SEfl TFLALCALRFhRRIIGLKDEFYNRYITKGNLFEPVINALLDNGTRYNLLNSAVIELFEFI 

SEG 

PRD hhhhhhhhhhhhhhhhccchhhhhccccccchhhhhhhhhcccccccccchhhhhhhhhh 

40 

SEfl RVEDIKSLTAHIVENFYKALESIEYVt3TFK6LK:TKYE(3EiCDR(3N62KLNSVPSILRSNRFR 

SEC - 

PRD hheeehhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhcccccccccccccccchhh 

45 SEfl RDAKALEEDEEMIJFNEDEEEE6KA VVAPVEKPKPEDDFPDNYEKFMETKKAKESEDKENL 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhccccccceeeeeeeccccccccccccchhhhhhhhhhhhcccccc 

SEl2 PKRTSPGGFKFTFSHSASAANGTNSKSVVAtflPPATSNGSSSKTTNLPTSVTATKGSLVG 

50 SEG 

PRD ccccccccceeeeccccccccccccceeeeecccccccccccccccccccccccccccee 

SE(2 LVDYPDDEEEDEEEESSPRKRPRLGS 

SEG xxxxxxxxxx- 

55 PRD eeccccccchhhhhcccccccccccc 

(No Prosite data available for DKFZphamy2__12il - 3) 
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No Pfam data available for DKFZphamy2_12il -3) 
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5 group: amygdala derived 

DKFZphamy2_13gl c l encodes a novel 261 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains a PROSITE ASP_PR0TEASE motif and seem 
to be expressed 
Ubiquitously- 

No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 

15 

The new protein can find application in studying the expression 
profile of amygdala-specific genes- 



20 unknown protein 

perhaps complete cds- 
Pedant: SIGNAL^ PEPTIDE 

25 Sequenced by EMBL 

Locus: /chromosome = "12pl3 - 3" 

Insert length: 275M bp 
30 Poly A stretch at pos- 2743-> polyadenylation signal at pos- £72*4 

1 GCAATCTCGG GAAATTGGAG ACTGACGCGG CTGCTCCTGC ATGTTATTTA 

51 TTTTTCCTCT TTCCCTCCCG TGGAGACCCT CCTGTTGGAA AGAGAGCTGC 

35 101 AGCACGGGAC AGAGACAGGC AGGAAGAAGC AGAGAGGACT CGGTGACGCC 

151 CCCACCGAGC AGCCCCTGGC CCACTCCTCC AGCAGGGGCC ATGAGCACCA 

201 AGCAGGAGGC CAGGAGAGAT GAGGGAGAAG CCAGGACGAG GGGGCAGGAG 

251 GCACAGCTTC GAGACCGAGC CCACCTGAGC CAGCAGCGCC GGCTCAAACA 

301 GGCCACCCAG TTCCTGCACA AGGACTCGGC CGACCTGCTC CCGCTGGACA 

40 351 GCCTCAAGAG GCTCGGCACC TCCAAGGACT TGCAGCCGCG CAGTGTGATC 

i*01 CAAAGACGCC TGGTGGAGGG AAACCCGAAT TGGCTTCAGG GGGAGCCTCC 

451 CCGGATGCAG GACCTGATTC ATGGCCAGGA GAGCAGGAGG AAGACCAGCA 

501 GGACAGAGAT TCCAGCTCTT CTGGTCAACT GCAAGTGCCA GGACCAGCTG 

551 CTTAGAGTGG CCGTTGACAC AGGCACCCAA TACAATCGGA TCTCTGCTGG 

45 bDl ATGTCTCAGC CGCCTGGGGT TAGAGAAAAG GGTCCTAA A A GCCTCAGCTG 

1,51 GGGACCTGGC CCCTGGGCCC CCAACCCAGG TGGAGCAGTT GGAGCTACAG 

7D1 CTGGGGCAGG AGACTGTGGT GTGCTCGGCA CAGGTGGTGG ATGCTGAGAG 

751 TCCTGA ATTC TGCCTGGGGC TGCAGACTCT GCTTTCTCTC AAGTGCTGCA 

aOl TCGACCTGGA GCACGGAGTG CTGCGGCTGA AAGCCCCGTT CTCAGAGCTA 

50 A51 CCCTTCCTGC CTTTGTACCA AGAGCCTGGC CAGTGACTGC TGTCTCAGTC 

^01 AGTCCCCAGA GGGAAAGACC TTGCCTTAGA AGAAGAGGCG TGTGGGGAAC 

^51 GGGGGCTCTT GAAGCCAGGT AGCTGGGGAC TATGGTGTCT GCCCTTCCAA 

1001 TCACCTCCCT GACCCCTGCT GTCGCATTTT CCCCAGCTGG CCGCATTCCT 

1051 CTCTGCTTCT CAGCAGCTGT CCTACTCCCC AGGACGAGTT TTCACTAGAG 

55 1101 GGCCCACGAT GCCAGGATTC TGATTCATCT TCCTCCCAAG AAAAGCAAAG 

1151 CCAAATCAAG ACCACAGATA GGAACCTAAG CACAATGGGG TGCCTGCTTG 

1201 GGCTGGGTCG AAGGCTCTGC TGACTGCTGT CCTTGTCCAT CACCCAATAC 

1251 CACCCCAAAC ACAACTCAAC TTCCCACACC ACCATGTCTC TCACCACACC 
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1301 TTCTGGGCCT CATTATCTCC CACAACTAGA CCGCCATGCC TCACCAACCT 

1351 ATGTCCCTGG ACCTCCTGGT GTCTGCCTCT CGGAGTCTGT GCACATCTGC 

lMOl TCACAGTTGA GTGGGGGAAG AAACAGCCAG AATTCA ATAC AACAAAGAGC 

1H51 GGGAGTTAGT ATAGGAATGT CCATCTCATA AGGCTGAGAG CTATTTTTTC 

5 1501 CTGTGGCTGC AAATGTCTGA AGCCAGTTAG TTTGATTACC CTGTGCAAAA 

1551 CCTTGGACAT ACTTCTGCTA TTAACGCTAT AGGTATTTAT CCGTTTCCAC 

IbOl TGGCTTTTTG TACCCACCGA GCCCCTGAGC CTTGCGTGTG TGTGTGTGGA 

IbSl AGAGCCTTGT AGAGAACTGC TCCTGTGAGG CAGACAGGAC AGTGAGGTTG 

1701 TCACCACTCA GACTTCACCT ATTCAGCATT CTTTCTGATT TCTAGAACTA 

10 1751 TCCACCTCAT TAGGCCTTCT TCCTATCCCC ATCTCTGGCC TCTTGAGCTT 

IfiDl AAGCTTGTAT TGTCCTGGA A TCAGTGGCTT TCTAACCCCC TGCCAGGCTT 

1651 TGCCAA AGCA AAAAGACAGA GGCTTTTTTT TTTTTTTTAA AGTTTGGGGT 

nDl CTGTCAGGAG ACAGAGGCTT TTTTGAATTC ACTGTGAAGA GAAGAACCCG 

n51 AACCTTAAGA CGCCAGATCC CTGAGAGTCT TTCTGGCTGG TTTGAGTCTC 

15 E001 TCAAATCATG GATTAGGAGT AAAGA AAGAG GCAGGCGCAA TGGCTCATGC 

2051 CTGTAATCCC AGCACTTTGG GAGGCTGAGG TGGGTGGATC ACTTGAGGTC 

2101 AGGAGTTTGA GACCAGCCTG GGTAATATGG CAAAACCCCA TCTCTACTAA . 

2151 AAAATACAAA AATTAGCCAG GTATGGTGGT GAACACCTGT AATCCCAGCT 

2201 ACTTGGAAGG CTGAGGCATA GGAGTTGCTT GAACCTGGGA GATGGGGGTT 

20 2251 GTAGTGAGCC AAGTTCGTGC CATCGGACTC CAGCCTGGGT GAAGGAGTGA 

2301 GACCCTGTCT CCAAAAACAA ACAAAAAAGG AGCAGAGAAA GACAGTGGTA 

2351 CAGCTAACCT GAACAAGGGA ACTGGGACCG TTGGGCTGAA ACAGTCTTGA 

2H01 GCCTGGGGTT GACTGGGTTA GAGAAGAACC GGGATGCAAG GAGCTGCCTG 

2**51 TGACACCTGG CCTGCCCTTT CTCAGCTGCC TCCCCTGCCC TTTCTCAGCT 

25 2501 GCCTCCCCTG CCCTCAGAAG GAAAGGAGAG GGCTCACTTA TCACTTGTGC 

2551 CATAGCACCT GGTCTCAAA A TCCTAAAAGC TTTCCTCGCC CTCACTGCCT 

2b01 TGCTCCACAA GGTCCACTTT CCTGGGTCTT GTGCTGTGCC TTTCCTTGTC 

2b51 TGCCTCCTGC TGCTTCTGTA ACTGCAGACC CCAGGCCCAA TTGCAAGCCC 

2701 TCGGCTCAGC TGCTTCTCC A TTGGAATAAA CTCTTGTTTC TCTAAAAAAA 

30 2751 A A A A 



35 



40 



45 



BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 2 



ORF from HI bp to 663 bp=; peptide length: 261 
50 Category: putative protein 
Classification: no clue 

Prosite motifs: ASP_PROTEASE (173-lflM) 

55 1 MLFIFPLSLP URPSCUKESC STGGRdAGRS REDSVTPPPS SPWPTPPAGA 

51 MSTKflEARRD EGEARTRG(3E AflLRDRAHLS (3i2RRLKl2AT(3 FLHKDSADLL 

1D1 PLDSLKRLGT SKDLflPRSVI (3RRLVEGNPN ULtfGEPPRMfl DLIHG(3ESRR 

151 KTSRTEIPAL LVNCKCdDdL LRV AVDTGTtf YNRISAGCLS RLGLEKRVLK 
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201 ASAGDLAPGP PT<2VE(3LEL<2 LGflETVVCSA (3VVDAESPEF CLGLflTLLSL 
251 KCCIDLEHGV LRLKAPFSEL PFLPLY(2EPG (2 



BLASTP hits 

No BLASTP hits available 

10 Alert BLASTP hits for DKFZphamy2_13gn-. frame 2 

PIR:S5Dfc.Mb hypothetical protein YERlM3w - yeast (Saccharomyces 
cerevisiae) -i N = l-i Score = TQ-i P = 0-2b 

15 TREf1BL:RNDDb0_l product: "DNA ( cytosine-5- ) -methyl transferase" n 
Rattus 

norvegicus mRNA for DNA < cytosine-5- ) -methyl transf eraser partial 
cds • i 

N = li Score = fil-. P = D - ST 



20 



25 



30 



45 



50 



>PIR:S£DbMb hypothetical protein YER1M3W - yeast (Saccharomyces 
cerevisiae) 

Length = H2fi 

HSPs: 

Score = TO (13-5 bits)i Expect = 3-Oe-Ql-, P = 2-be-Ol 
Identities = 26/112 (2S50-, Positives = 4fl/llE- ■ ( ME* > 



(2uery : 155 TEIPALLVNCKC(2D<2LLRVAVDTGTf2YNRISAGCLSRLGLEKRVLKASAGD- 
— LAPGPP 211 

T++P L +N + + ++ VDTG a +S + GL + + K G+ 

+ G 

35 Sbjct: m 

TflVPMLYINIEINNYPVKAFVDTGAtf TTIMSTRLAKKTGLSRNIDKRFIGEARGVGTGKI 25fi 

<3uery: 212 XXXXXXXXXXXXXXXX- 
CSAt3VVDAESPEFCLGL(3TLLSLKCCIDLEHGVLRL 2b3 
40 CS V + D + + +GL L C + DL + 

VLR+ 

Sbjct: 25T IGRIHti2A(2VKIETt2YIPCSFTVLDTDI- 
DVLIGLDI1LKRHLACVDLKENVLRI 310 



Pedant information for DKFZphamy2_13gn «* frame 2 
Report for DKFZphamy2_13gl c i - 2 



ELENGTH3 2fil 

EJIIO 31330.^7 

EpIJ fl-75 

55 EBL0CKSJ PRODDED 

EBL0CKSID BP01T21G 

EPR0SITE3 ASP_PR0TEASE 

EK Id ID All_Alpha 
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EKbD SIGNAL_PEPTIDE 1? 

CKQO L0U_C0I1PLEXITY V. 

5 SEt2 I1LPIFPLSLPliJRPSCWKESCSTGc3R(2AGRSRE]>SVTPPPSSPUPTPPAGAnSTK(2EARRD 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccceeeccccccccccccceeecccccccccccccccchhhhhhhhhh 

SEtf EGEARTR6(3EA(3LRDRAHLS(3(3RRLKl3 AT(3FLHKPSADLLPLDSLKRLGTSKDL(3PRSVI 

10 SEG 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhh 

SE(2 (2RRLVEGNPNLILt3GEPPRf1(3I>LIHG(3ESRRKTSRTEIPALLVNCKC(3])(3LLRVAVDTGTl3 

SEG - 

15 PRD hhhhhccccccccccccccccccccccccccccccccchhhhhhhchhhhhhhhhhcccG 

SEC2 YNRISAGCLSRLGLEKRVLKASAGDLAPGPPT(3VE(3LEL(3LG(3ETVVCSA(3VVDAESPEF 

SEG xxxxxxxxxxxxxxxx-.- 

PRD eeecccchhhhhhhhhhhhhhhccccccccccchhhhhhhhccceeeeccceeecccccc 



20 



25 



30 



SEfl CLGL<3TLLSLKCCIDLEHGVLRLKAPFSELPFLPLY(3EPGi3 

SEG 

PRD cccchhhhhhhhhhcchhhhhhhcccccccccccccccccc 



Prosite for DKFZphamyE_13gn . 2 
PSDDim 173->1A5 ASP_PR0TEASE PDOCDOlEfl 

(No Pfam data available for DKFZphamy2_13gn - 2 ) 
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5 group: intracellular transport and trafficing 

DKFZphamyE_14b5 encodes a novel 771 amino acid protein which 
shows bl/C identity to the human TYL protein and Hfl* identity to 
the human Tic protein- 

10 

Both proteins show similarity to Sec? of Saccharomyces 
cerevisiae-i which takes function in vesicular traficking. The new 
protein shows also significant similarity to human ARN03i which 
is involved in the control of Golgi structure and function- 
15 DKFZphamy2_lMb5 is predominantly expressed in the ens and germ 
cells- 

The new protein can find application in diagnosis/therapy of 
diseases related to vesicular traficking e.g. in synapses of the 
20 central nervous system and in studying expression profiles- 

similarity to TYL protein (Homo sapiens) 



25 

Sequenced by EMBL 

Locus: /map="445-7 cR from top of ChrS linkage group" 

30 Insert length: i|52fl bp 

Poly A stretch at pos- MSll-i pdlyadeny lat ion signal at pos- ^fiT 

1 CTCGCTCAGC CTCTCCACAT CGCGGCTCCG GCACCTGAAG GGACGCGGGC 

35 51 GGGCGCGGGC AGCTCCGACC GGCGGCGGCG GGGCGGGACA GGCAGCCCGG 

101 CGGCCTCCC IGGCCCCGCC GTGAGAGGCC GGACCCGCGG CGGGGACCAG 

151 CAGCGGTCT • AGGAGTCCC AGGAGCAGCC AGGACAGGCG GAAGCAGTGG 

201 CTGCCATGC : AGGACAAG CTCTTATCTG CAGTGCCTGA GGAAGGCGAT 

251 GCCACCCG1 '^CCCGGTCC AGAGCCTGAA GAGGAGCCAG GGGTCCGGAA 

40 301 TGGGATGGl k TGAGGGCC TGAACAGCAG CCTCTGCAGC CCAGGGCACG 

351 AGCGAAGGGs : CCCCAGCG GACACTGAGG AACCCACGAA GGACCCAGAT 

401 GTGGCCTTCC 1 GGCCTCAG CCTTGGCCTC TCTCTCACCA ATGGCCTAGC 

451 CCTGGGGCCA ^ ACTTGAACA TTCTGGAAGA TTCAGCGGAG TCCAGGCCCT 

501 GGAGGGCTGG CGTGCTGGCA GAGGGGGACA ATGCTTCCAG GAGCCTCTAC 

45 551 CCAGATGCTG AGGACCCTCA GCTGGGGTTG GATGGTCCCG GGGAGCCAGA 

tOl TGTGCGGGAT GGCTTCAGCG CCACGTTTGA GAAGATTCTG GAGTCAGAGC 

kSl TGC1GCGGGG CACCCAGTAC AGCAGCCTCG ACTCCCTAGA CGGGCTGAGC 

701 CTCACGGATG AGAGCGACAG CTGCGTCAGC TTCGAGGCCC CCCTCACACC 

751 CCTCATCCAG CAGCGGGCCC GTGACAGCCC TGAGCCAGGG GCTGGGTTGG 

50 601 GCATTGGGGA CATGGCGTTT GAGGGGGACA TGGGGGCAGC TGGTGGTGAT 

A51 GGGGAGCTGG GCAGCCCCCT GCGGCGCTCC ATCTCCAGCA GCCGCTCTGA 

^01 GAATGTCCTG AGCCGCCTGT CTCTCATGGC CATGCCCAAT GGATTCCATG 

^51 AAGATGGCCC TCAGGGCCCA GGGGGGGMG AGGATGATGA TGAGGAGGAC 

1001 ACGGACAAGT TGCTGAACTC AGCCAGTGAC CCCAGCCTGA AGGATGGCCT 

55 1051 GTCAGACTCA GACTCTGAGC TCAGCAGCTC GGAGGGGTTG GAGCCTGGTA 

1101 GTGCAGACCC TCTGGCCAAC GGGTGCCAGG GGGTCAGTGA AGCTGCTCAT 

1151 CGGCTGGCAC GCCGTCTCTA CCACCTCGAG GGCTTCCAGC GCTGTGATGT 

1201 GGCCCGGCAG CTGGGCAAGA ACAACGAGTT TAGCAGGCTG GTGGCCGGGG 
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1251 AGTACCTCAG TTTCTTCGAC 
1301 AGAACATTCT TGAAGGCCTT 
1351 GCGGGTCCTC ACACACTTCT 
1401 ACAGCACTTC GG A AGATGGG 
5 1451 CTCAACACGG ACCTGCACGG 
1501 GCAATTCATT GCCAACTTGG 
1551 AAGACCTGCT GAAGACCCTT 
IbOl TGGGCCATTG ATGAGGATGA 
lb51 TGACAAGTTC GGGACAGGCA 
10 17D1 GCAACCCCTT CCTGGATGTC 
1751 CACGGCGTCC TGACCCGGAA 
lflOl GCCCCGTGGG AGGCGTGGCT 
1A51 CCATCCTGTA CCTGCAGAAG 
nDl GAGGGTGACC TGAAGAACGC 
15 1T51 GGCCTCTGAC TACAGCAAGA 
2001 ACTGGAGGGT ATTCCTCTTC 
2051 TGGATCCTCA GGATCAACCT 
5101 CCCAGCCGCT GTCAGCTCCA 
2151 CCTGCACCAC CCGCCTCTGC 
20 2201 AAGTTGAGGC AGCTGACTGC 
2551 CGAGAGGGGC ATCAAGTCCA 
2301 ACTATCTCAC CTTCGAGAAA 
2351 GCTATGAAAA TCAAAGTGGG 
2M01 GCTGGCCACT CTGGAAGGGG 
25 2451 GCCCTGCCCT CAGCCAGGGC 
2501 GCCACTGGGC CTGATACTTA 
2551 GCAGATGTCT CCAGTGGGGT 
2t01 TGGACCAAGC TCCAGTCAGT 
2b51 TGTGGGCCCA GGAGATGGAG 
30 2701 CCTTGGGCAT CTCCGGGCAT 
2751 TCCACCATGG AGCCTCATTT 
2fi01 CCACCTCGCT GGAGAAGCTG 
2A51 TTCTCATCAA GCTCCTCTCC 
2^D1 GACTCTAGGT CTCAGCTGGA 
35 2^51 AGTTGACCAG CAGCAGGTCT 
3001 GCAGCCTCCA GAACCATGCC 
3051 GGGACCCAGG CTTGTGCCCT 
3101 CATCCACTTC TTTTCATCCA 
3151 ACACA AATAT ACATCTATA A 
40 3201 GATGGTTTTG GAACTGGAAT 
3551 AGCCTATTTT GGAGCTTGCC 
3301 CTGGCATTCC TGACGCTCTA 
3351 CTTGCCCTGT TTTTCCCTCT 
3401 ACCGATGATA CTTTTGGAAA 
45 3451 TCCCATGGAT ATCCTGGGGT 
3501 TATTTGGGTC TTTATGTTGG 
3551 CCTCTTGGAC AGATCTACTG 
3L.01 AATGGCTGCT TGTCAACAAG 
3b51 GAAGCCCTTA ATTCTTGGTT 
50 3701 TTGTTTTTCT GTCGCTATTT 
3751 TAGCCAAGGC CACATCTGGG 
3fi01 ACGGATTAGC TAGCACCTTT 
3fl51 GTGGGGGTGA TGGCACATTC 
3T01 TTGTTTAAAT AGATTACTGT 
55 3=151 ACACACTTAG TTAATAAAAT 
4001 TACAGGCCCA CTCACATTTG 
4051 TGTAGTGATT TAAATTCAGA 
4101 TTAGACCCA A ACCATCTGGC 



PCT/IB01/02050 

TTCTCGGGCT TGACTCTGGA CGGAGCACTC 
CCCGCTGATG GGGGAGACAC AAGAGCGTGA 
CCCGCCGGTA CTGCCAGTGC AACCCTGATG 
ATCCACACGC TCACCTGTGC CCTGATGCTG 
CCACAACATT GGCAAAAAGA TGTCCTGTCA 
ACCAGCTGAA TGATGGCCAA GACTTTGCCA 
TACAACTCCA TCAAGAATGA AAAGCTGGAA 
GCTGAGGAAA TCCCTGTCTG AGCTGGTGGA 
CGAAGAAGGT GACGCGAATC CTGGATGGTG 
CCACAGGCGC TCAGTGCCAC CACCTACAAG 
GACTCACGCT GACATGGATG GCAAGAGGAC 
GGAAGAAATT CTACGCAGTG CTCAAAGGGA 
GATGAGTACA GGCCTGACAA AGCTCTATCG 
CATTCGCGTG CATCACGCTC TGGCCACCAG 
AGTCCAACGT GCTGAAGCTT AAGACAGCCG 
CAGGCACCGA GCAAGGAAGA AATGCTGTCC 
GGTGGCAGCC ATCTTCTCTG CCCCGGCCTT 
TGAAGAAGTT CTGTCGGCCC CTGCTGCCCT 
CAGGAGGAGC AACTGCGGTC TCATGAGAAT 
GGAGCTGGCC GAACACAGGT GTCACCCAGT 
AGGAGGCCGA GGAGTACCGG TTGAAGGAGC 
AGCCGTTATG AGACCTATAT CCACCTCCTG 
CTCAGATGAT CTGGAGGGGA TTGAGGCCCG 
ATGACCCTTC TCTCCGGAAG ACACATTCAA 
CATGTGACTG GCAGCAAAAC CACAAAGGAT 
GCTGACATGG ATTTGCAGAC CCCAGGGTGG 
CAGTGAGCAC AATTCCAGCC AGGGGCCACT 
TGATGGGCAG CTAGAGGGGT GCAGAAAGCC 
ATGCCGTTTG TGGCGTTGAT CTCCTTGCGT 
CAGACCCTCT CCCTGGCCCT TGTTTTCCTC 
TGTAGGCCAG TTGTGTGCAT GCTCTAGACA 
GAAGGGCTGT TGTCTTCCCA GGTCTTTCTC 
TCATCTTTTT TGTGTGTGAG GGCAGGTCTT 
ACCCCACCCT TTCTCCTCCT CCTTCCTCTG 
GCCGACCACC AGCACCATCC TCTCCTCCCA 
CAGGTCTCCT GCCTCACATC ACAATAATCT 
TTCAGTGTAA AGCTGACTCC ATCACATGTG 
TTGAGATCAC ACTGCCTCCT TTTTATACAG 
GAATAATATA TACATAAGGA ACCCCTGAAA 
CAGTTAGAGG ATGAAATCAG ATAAAGGAAA 
CTGTTAGGAA GGATGGCTGC ACCTGGCCCC 
GGAGGGAAGG GGGAGGCAGT GCTGGCCTCC 
TCCAGCTGAC CTGTGACTTA TACTGCTCTT 
AAATAGAGCG TGTATGCACC GCCCCGTTTG 
GTGAGTCGGA TGGGACCACG GCCCTGTTTA 
TGCTGCCAGG TCTCTGAGCT CCAGAGGTGG 
CTATAGGAAT AAAAGACACT CTGTCTCGCA 
CCCAAAGATG CTTGTCGGAG GACGGTTATG 
GTGGGAAAAG GTGGAATGAC AAGTTATTGA 
CTTTCATTTG TCTAGTGAAT CAGAAAGGCT 
AAGAGTGGAG AAATTTGCCA CTTGACGATC 
AAGCCCTGCA TTTCTCCAAC TGACAAGTGG 
AGTGTGGCTA TGA AGAGCGA ATCCTCTCTA 
AGTTTGGCCA GGA ATTTGGC GTCAGTGGTA 
AAGCCAGGCT TGCAACTAAG TATCTA ACTT 
AGGCAAGGGG CTATTGAGTA TGTGGAGAGA 
TTATTTAAGT TGGATCAGGT GAAGTGTGTT 
CCCTTCGTTT TGCTCAGAGG AAGTAAATGT 
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55 



4201 
4251 

4301 

4351 
4401 
4451 
4501 



TCACTTAAAT 
GTACTTTCCC 
CCCACCCTTC 
TCTGGCAGCT 
TCAGTAAAAA 
GTAAATTATA 
ACATACACAA 
GTCAAAGTTC 



GAAATTGAAA 
CATGCTGCCT 
ACTTCCCGAG 
^TGGACAGAT 
AATGTTTAGT 
TAC ATGTACT 
AAATAGAAAT 
CAAAAAAAAA 



ACGCCATGTG 
CAAAAGTTCT 
GGCGGGTGAG 
GTGCTTCCTG 
TCACTTCCTT 
ACTGTACTAA 
TTAAAAAAGA 
AAAAAAAA 



GCACCACAAA 
GTGAGTTTCG 
TGGAGAGCAG 
AGCATGGGTT 
AATTGTATAA 
AATATTATGT 
TGAGATGAAA 



AGAGCTCTCT 
GGGTCAGTGT 
AGCCAGGAGC 
GTGCCTCCCA 
TTATTTATTT 
ACATT ATAAA 
ATAAATCTAA 



10 



15 



30 



BLAST Results 



No BLAST result 



Medline entries 



20 ^flDflbHfiS: 

Perletti L-i Talarico Di Trecca D-> Ronchetti Di Fracchiolla NS-i 
Maiolo AT-» Neri A-t Identification of a novel gene-i PSD-i adjacent 
to 

fJFKBE/lyt-10 -i which contains Sec? and pleckstr in-homology 
25 domains. 

Genomics Mb : 251-25^ till!) 



Peptide information for frame 2 



35 



40 



45 



50 



ORF from 20b bp to 251fi bp; peptide length: 771 
Category : .similarity - to known protein 
Classification: Cell signaling/communication 



1 MEEDKLLSAV 
51 RGTPADTEEP 
101 AGVLAEGDNA 
151 RGTflYSSLDS 
201 GDMAFEGDPIG 
251 GPdGPGGDED 
301 DPLANGCflGV 
351 LSFFDFSGLT 
401 TSEDGIHTLT 
451 LLKTLYNSIK 
501 PFLDVP(3ALS 
551 LYLflKDEYRP 
bOl RVFLFtfAPSK 
b51 TTRLC(3EE(3L 
701 LTFEKSRYET 
751 ALSflGHVTGS 



PEEGDATRDP 
TKDPDVAFHG 
SRSLYPD AED 
LDGLSLTDES 
AAGGDGELGS 
DDEEDTDKLL 
SEAAHRL ARR 
LDGALRTFLK 
CALMLLNTDL 
NEKLEUAIDE 
ATTYKHGVLT 
DKALSEGDLK 
EENLSUILRI 
RSHENKLRdL 
YIHLLAP1KIK 
KTTKDATGPD 



GPEPEEEPGV 
LSLGLSLTNG 
PC3LGLDGPGE 
DSCVSFEAPL 
PLRRSISSSR 
NSASDPSLKD 
LYHLEGFfiRC 
AFPLIIGETflE 
HGHNIGKKNS 
DELRKSLSEL 
RKTHADMDGK 
NAIRVHHALA 
NLVAAIFSAP 
TAELAEHRCH 
VGSDDLERIE 
T 



RNGflASEGLN 
LALGPDLNIL 
PDVRDGFSAT 
TPLItfflRARD 
SENVLSRLSL 
GLSDSDSELS 
DVAR<2LGKNN 
RERVLTHFSR 
Ct2t3FIANLDG3 
VDDKFGTGTK 
RTPRGRRGUK 
TRASDYSKKS 
AFPAAVSSI1K 
PVERGIKSKE 
ARLATLEGDD 



SSLCSPGHER 
EDSAESRPUR 
FEKILESELL 
SPEPGAGLGI 
HAflPNCFHED 
SSEGLEPGSA 
EFSRLVAGEY 
RYCtSCNPDDS 
LNDGC2DF AKD 
KVTRILDGGN 
KFYAVLKGTI 
NVLKLKTADbl 
KFCRPLLPSC 
AEEYRLKEHY 
PSLRKTHSSP 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_14b5-. frame 2 

5 PIR:G012D5 TYL protein - human-. N = 2n Score = 1421-, P = fl.be-lSD 

TREF1BL : AB 0231 5^1 gene: "KIAA0T42 n ^ product: "KIAACH42 protein":, 
Homo 

sapiens mRNA for KIAA0T42 protein-, partial cds--» N = 1-, Score = 
10 1251-, P 

= 2-3e-127 

TREMBL:Ub3127_l gene: "TIC"n product: "Tic"=i Human SEC7 homolog 
Tic 

15 (TIC) mRNA-, complete cds.-i N = Score = 1050-, P = 4.be-10b 



>PIR:G01205 TYL protein - human 
Length = b45 

20 

HSPs: 

Score = 1421 (213-2 bits)-, Expect = fi-be-lSD-, Sum P(2) = B-be- 
150 

25 Identities = 2AD/4S2 (bl*)-. Positives = 33b/452 (74*) 
duery : 301 

DPLANGCaGVSEAAHRLARRLYHLEGFGROVARflLGKNNEFSRLVAGEYLSFFDFSGLT 3bD 
D L+NG + EAA RLA+RLY L+GF++ DVAR LGKNN+FS+LVAGEYL 

30 FF F+G+T 

Sbjct: Ibb 

DTLSNG(2KADLEAA(3RLAKRL YRLDGFRKADVARHLGKNNDFSKLVAGEYLKFF VFTGHT 225 
duery: 3bl 

35 LPGALRTFLKAFPLflGETfiERERVLTHFSRRYCfiCNPDDSTSEDGIHTLTCALMLLNTDL 420 

LD ALR FLK LUGETCERERVL HFS+RY (3CNP+ +SEDG 
HTLTCALNLLNTDL 
Sbjct: 22b 

LD(2ALRVFLKELALriGET(2ERERVLAHFS(3RYF(2CNPEALSSEDGAHTLTCALnLLNT])L 2fi5 

40 

Query * 421 

HGHNIGKICriSC(3(2FIANLD(2LNI>G(2DFAKDLLKTLYNSIKNEKLEWAIDEDELRKSLSEL 45D 

HGHNIGK+M+C FI NL+ LNDG DF ++LLK 
LY+SIKNEKL+UAIDE+ELR+SLSEL 
45 Sbjct: 2flb 

HGHNIGKRHTCGDFIGNLEGLNDGGDFPRELLKALYSSIKNEKL(3liIAIDEEELRRSLSEL 345 

tfuery: 4fil VDDKFGTGTKKVTRIL 

DGGNPFLDVPflALSATTYKHGVLTRKTHADMDGKRTPRGR 53b 
50 D K + RI G +PFLD+ A YKHG L RK HAD J> 

++TPRG+ 

Sbjct: 34b ADPN 

PKVIKRISGGSGSGSSPFLDLTPEPGAAVYKHGALVRKVHADPDCRKTPRGK 401 
55 duery: 537 

RGUKKFYAVLKGTILYLdKDEYRPDKALSEGDLKNAIRVHHALATRASDYSKKSNVLKLK 5^b 

RGUIK F+ +LKG ILYLdK+EY+P KALSE +LKNAI 
+HHALATRASDYSK+ +V L+ 
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Sbjct: ^□5 

RGUKSFHGILKGMILYLflKEEYKPGKALSETELKNAISIHHALATRASDYSKRPHVFYLR Mbl 
<2uery: 5^7 

TADURVFLF(3APSKEEriLSUILRINLXXXXXXXXXXXXXXXSriKKFCRPLLPSCTTRLCc3 bSL 
TADWRVFLFtfAPS E + M Sbll RIN+ S KKF 

RPLLPS TRL (3 
Sbjct: MbE 

TADURVFLF(3APSLE(3n(3SlJITRINVVAAf1FSAPPFPAAVSSt3KK:FSRPLLPSAATRLS(3 5E1 



duery: bS7 

EEtfLRSHENKLRGLTAELAEHRCHPVERGIKSKEAEEYRLKEHYLTFEKSRYETYIHLLA 71b 
EE(2 + R+HE KL+ + +EL EHR + + + KEAEE R KE YL FEKSRY 

TY LL 
15 Sbjct: SEE 

EEtfVRTHEAKLKAnASELREHRAAflLGKKGRGKEAEEflRdKEAYLEFEKSRYSTYAALLR 5&1 

fluery: 717 MKIKVGSDDLERIEARLATLEGDDPSLRKTHSSPAL 7SE 
+K+K GS++L+ +EA LA + L +HSSP+L 

20 Sbjct: 5A2 VKLKAGSEELD A VEA ALAtfAGSTEDGLPPSHSSPSL b!7 

Score = b3 bits)i Expect = fi-be-lSD-i Sum P(E) = a.be-150 

Identities = (E^*)-. Positives = E3/LM <35*> 

25 fluery: 13E DVRDGFSATFEKILESELLRGTtfYXXXXXXXXXXXXXXXXX- 
CVSFEAPLTPLI(3(3RARD l^D 

J> J> FS FE ILES +GT Y +FE P P 

+ 

Sbjct: Ifl 

30 DGPDSFSCVFEAILESHRAKGTSYTSLASLEALASPGPT(3SPFFTFELPP(2PPAPRPI>PP 77 

<2uery: 1^1 SPEP 1^ 
+ P P 

Sbjct: 7fi APAP fil 

35 

Pedant information for DKFZphamyE_mb5i frame E 

40 Report for DKFZphamyE_lMb5 • E 

CLENGTH3 771 
EMU J 

45 EpIJ S-OM 

EHOMOLJ PIR : G01E05 TYL protein - human le-lSa 

EFUNCAT3 30-0^ organization of intracellular transport vesicles 

ES- cerevisiaei YDR170cJ 5e-£E 
EFUNCAT3 30-Da organization of golgi ES. cerevisiae-i YDR170c3 

50 5e-EE 

EFUNCATJ 3D-03 organization of cytoplasm ES- cerevisiae-i 
YDR170c3 5e-E£ 

EFUNCATJ Da. 07 vesicular transport Cgolgi network-! etc-) ES- 
cerevisiae-. YPR17Dc3 5e-EE 
55 EFUNCAT3 n unclassified proteins ES- cerevisiae-i YPRDTScIB 

He-DM 

EBL0CKS3 BL01E77B 
EBL0CKS3 BPDE373F 



-100- 



WO 01/98454 PCT/IB01/02050 



GZBLOCKS J 


PR00b55C 












IT D 1 A/'U'^* ~n 


nom nunc 












itdi /\ r v 


HKULJcc: id 












ITRI ftri^^Tl 

ILOL.vV.KoJl 


R D n P L ULT\ 












IL DL. \j v_no ji 


DpnniQi a 
HrvUUdnjiA 












ILOLvV.No J 


DMdlBSMM 












IL DL_vV.No JJ 


PFD13b^B 












ILDL.V7 V-N o Ji 


PF-J^t^A 














dlbtn 2 . 


m- 


1-1.5 beta-s 


pectnn 


(Emous 


muscul us ) 


brain le-BT 












fT D T IP I.I Tl 

ELr IKnU J 


transmembrane 


protein 


le- 


50 




IL ourr M 1 1 JI 


Caenorhabditis 


ele 


gans KDbH? - k 


pr o tei n 


7e-cH 


CSUPFAM3I 


pleckstrin repeat 


homology 


7e- 






CPFAM3 


PH (pleckstri 


n homology) 


domain 




EKIiO 


Irregular 












EKIiO 


3J> 












EKIiO 


L0W_C0J1PLEXITY 




lfi.142 '/. 









20 SE<3 MEEI>KLLSAVPEECl)ATRI>PGPEPEEEP6VRISIGnASEGLNSSLCSPGHERRGTPAI>TEEP 

SEG xxxxxxxxxx - 

Ibtn- 

25 SEG TKDPDVAFHGLSLGLSLTNGLALGPDLNILEDSAESRPURAGVLAEGDNASRSLYPDAED 

SEG xxxxxxxxxxxxxxx 

Ibtn- 

30 SEG PaLGLJ>GPG£PJ>^RdGFSATFEKIL£SELLRGTQySSLJ>SLJ>GLSLTJ>ESI>KC\/SF£APL 

SEG xxxxxxxxxxxxxxxxx 

Ibtn- 

35 SE<2 TPLI(3t3RARDSPEPGAGLGIGDI1AFEGDriGAAGGDGELGSPLRRSISSSRSENVLSRLSL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

Ibtn- 

40 SE<2 nAnPNGFHEI)GPt3GPGGDEl>I>I>EEI>T10KLLNSASI>PSLK:DGLSDSDSELSSSEGLEPGSA 

SEG • . . . xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx. 

Ibtn- 

45 SEG 3>PLANGCc3GVSEAAHRLARRLYHLEGFQRCDVAR(3LGKNNEFSRLVAGEYLSFFI>FSGLT 

SEG 

Ibtn- 

50 SE<2 LDGALRTFLKAFPLriGET(3ERERVLTHFSRRYC(3CNPDDSTSEI>GIHTLTCALI1LLNTl>L 

SEG 

lbtn- 

55 SE<2 HGHNIGKKf1SC(3(2FIANLD(2LNI>G(3BFAKDLLKTLYNSIKNEKLElJAIDEI)ELR < kSLSEL 

SEG 

Ibtn- 
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SE<3 VDDKFGTGTKKVTRILDGGNPFLDVP(2ALSATTYKHGVLTRKTH ADnDGKRTPRGRRGWK 

SEG . . 

lbtn- EEEEEEEEEETTTEET — 

5 TTTCEE 



SE£J KF YAVLKGTILYLflKDEYRPDKALSEGDLKNAIRVHHALATRASDYSKKSNVLKLKTADb) 

SEG 

lbtn- EEEEEEETTEEEEECCHHHHHHCCBTTT- 
10 TCCEETTTTEEEETTTTTCTTTEEEEETTTT 

SE<2 RVFLFfl APSKEEMLSWILRINLVAAIFSAPAFPAAVSSMKKFCRPLLPSCTTRLCOEEflL 

SEG ------ xxxxxxxxxxxxxxx - 

Ibtn- 

15 CEEEEECCCHHHHHHHHHHHH 

SEfl RSHENKLROLT AELAEHRCHPVERGIKSKEAEEYRLKEHYLTFEKSRYETYIHLLAMKIK 

SEG 

Ibtn- 

20 

SE<3 VGSDDLERIEARLATLEGDDPSLRKTHSSPALStfGHVTGSKTTKDATGPDT 

SEG 

lbtn- 

25 

(No Prosite data available for DKFZphamy2_mbS . 2 ) 



30 Pfam for DKFZphamy2_mbS. 2 



HMM_NAHE PH (pleckstrin homology) domain 
35 WW 

*dvIREGIi)nyKUgswrkstg nldqrRUFvLrndpnrLiYYkddk 

+ ++G + +++ + *+ U++ ++VL++ + L++ 

KD+ 

fluery 512 TTYKHGVLTRKTHADH]>GKRTPRGRRGUK<FYAVLKG — 

40 TILYLOKDE- 557 



dekPr- YMlIdld • cUrMidVEidbJmmdndHCFilUtrq . rtYYF 

+P+ ++++ + ++D ++ +++ +++T + 

45 R+++F 

fluery SSfl -YRPDKALSEGDLKNAIRVHHALATRASDYSKK- 

SNVLKLKTADURVFLF b05 

WW (JAeNeEEMmeUMsalrRalw* 
50 <2A+++EEM +U+ 1+ + + 

Quer\/ hOh (2APSKEEMLSUILRINLVAA b2S 
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5 group: transcription factors 

DKFZphamyE_mmlb.pl encodes a novel E52 amino acid protein with 
similarity to the homeotic protein emx2 of man-i mouse and zebra 
fish as well as to the gene "empty spiracles" of Drosophila 
10 melanogaster- 

Homoeobox genes are known to play important roles in 
developmental processes- In zebrafish emx2 mRNAs are found in the 
dorsal telencephalon-! parts of the diencephalon and the otocyst- 

15 The human homologue EmxE appears to be already expressed in fi-5 
day embryos- It is also expressed in the presumptive cerebral 
cortex-i olfactory bulbSn in some neuroectodermal areas in 
embryonic head including olfactory placodes in earlier stages and 
olfactory epithelia later in development. Mutants of the D . 

20 melanogaster gene "mempty spiracles" display spiracles devoid of 
f ilzkorper i no antenna and an open head. 

The new protein can find application in modulating the expression 
of genes controlled by this transcription factor and modulation 
25 of neuronal development- 
strong similarity to homeotic protein emxE (Homo sapiens) 
perhaps differential splicing 

30 

Sequenced by EMBL 



Locus : /chromosome="lD" 

35 Insert length: Emt bp 

Poly A stretch at pos- 23^8-, polyadenylation signal at pos- 2373 

1 GAAAAAAAAA GAAAAAAAAA GAAAA A AAAT TACCCCAATC CACGCCTGCA 

40 51 AATTCTTCTG GAAGGATTTT CCCCCCTCTC TTCAGGTTGG GCGCGTTTGG 

101 TGCAAGATTC TCGGGATCCT CGGCTTTGCC TCTCCCTCTC CCTCCCCCCT 
151 CCTTTCCTTT TTCCTTTCCT TTCCTTTCTT TCTTCCTTTC CTTCCCCCCA 
501 CCCCCACCCC CACCCCAAAC AAACGAGTCC CCAATTCTCG TCCGTCCTCG 
E51 CCGCGGGCAG CGGGCGGCGG AGGCAGCGTG CGGCGGTCGC CAGGAGCTGG 

45 301 GAGCCCAGGG CGCCCGCTCC TCGGCGCAGC ATGTTCCAGC CGGCGCCCAA 

351 GCGCTGCTTC ACCATCGAGT CGCTGGTGGC CAAGGACAGT CCCCTGCCCG 
M01 CCTCGCGCTC CGAGGACCCC ATCCGTCCCG CGGCACTCAG CTACGCTAAC 
MSI TCCAGCCCCA TAAATCCGTT CCTCAACGGC TTCCACTCGG CCGCCGCCGC 
501 CGCCGCCGGT AGGGGCGTCT ACTCCAACCC GGACTTGGTG TTCGCCGAGG 

50 551 CGGTCTCGCA CCCGCCCAAC CCCGCCGTGC CAGTGCACCC GGTGCCGCCG 

bOl CCGCACGCCC TGGCCGCCCA CCCCCTACCC TCCTCGCACT CGCCACACCC 
bSl CCTATTCGCC TCGCAGCAGC GGGATCCGTC CACCTTCTAC CCCTGGCTCA 
701 TCCACCGCTA CCGAT ATCTG GGTCATCGCT TCCAAGGGAA CGACACTAGC 
751 CCCGAGAGTT TCCTTTTGCA CAACGCGCTG GCCCGAAAGC CCAAGCGGAT 

55 flOl CCGAACCGCC TTCTCCCCGT CCCAGCTTCT AAGGCTGGAA CACGCCTTTG 

A51 AGAAGAATCA CTACGTGGTG GGCGCCGA A A GGAAGCAGCT GGCACACAGC 
^01 CTCAGCCTCA CGGAAACTCA GGTAAAAGTA TGGTTTCAGA ACCGAAGAAC 
^51 AAAGTTCAAA AGGCAGAAGC TGGAGGAAGA AGGCTCAGAT TCGCAACAA A 
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1001 AGAAAAAAGG GACGCACCAT ATTAACCGGT GGAGAATCGC CACCA AGCAG 

1D51 GCGAGTCCGG AGGAAATAGA CGTGACCTCA GATGATTAAA AACAT AAACC 

1101 TAACCCCACA GAAACGGACA ACATGGAGCA A AAGAGACAG GGAGAGGTGG 

1151 AGAAGGAAAA AACCCTACAA AACAAAAACA AACCGCATAC ACGTTCACCG 

5 1201 AGAAAGGGAG AGGGAATCGG AGGGAGCAGC GGAATGCGGC GAAGACTCTG 

1251 GACAGCGAGG GCACAGGGTC CCAA ACCGAG GCCGCGCCAA GATGGCAGAG 

13E31 GATGGAGGCT CCTTCATCAA CAAGCGACCC TCGTCTA AAG AGGCAGCTGA 

1351 GTGAGAGACA CAGAGAGAAG GAGAAAGAGG GAGGGAGAGA GAGAAAGAGA 

1M01 GAGAAAGAGA GAGAGAGAGA GAGAGAAAGC TGAACGTGCA CTCTG ACAAG 

10 1M51 GGGAGCTGTC AATCAA ACAC CAAACCGGGG AGACAAGATG ATTGGCAGGT 

1501 ATTCCGTTTA TCACAGTCCA CTTAA AAAAT GATGATGATG ATAAAAACCA 

1551 CGACCCAACC AGGCACAGGA CTTTTTTGTT TTTTGCACTT CGCTGTGTTT 

IbOl CCCCCCCATC TTTAAAAATA ATTAGTAATA AAAAACAAAA ATTCCATATC 

lb51 TAGCCCCATC CCACACCTGT TTCAAATCCT TGAAATGCAT GTAGCAGTTG 

15 1701 TTGGGCGAAT GGTGTTTAAA GACCGAAAAT GAATTGTAAT TTTCTTTTCC 

1751 TTTTAAAGAC AGGTTCTGTG TGCTTTTTAT TTTGATTTTT TTTCCCAAGA 

IfiOl AATGTGCAGT CTGTAAACAC TTTTTGATAC CTTCTGATGT CAAAGTGATT 

1651 GTGCAAGCTA AATGAAGTAG GCTCAGCGAT AGTGGTCCTC TTACAGAGAA 

nDl ACGGGGAGCA GGACGACGGG GGGGCTGGGG GTGGCGGGGG AGGGTGCCCA 

20 1=151 CAAAAAGAAT CAGGACTTGT ACTGGGAAAA AAACCCCTAA ATTAATTATA 

B001 TTTCTTGGAC ATTCCCTTTC CTAACATCCT GAGGCTTAAA ACCCTGATGC 

2051 AAACTTCTCC TTTCAGTGGT TGGAGAAATT GGCCGAGTTC AACCATTCAC 

2101 TGCA ATGCCT ATTCCAAACT TTAAATCTAT CTATTGCAAA ACCTGAAGGA 

2151 CTGTAGTTAG CGGGGATGAT GTTAAGTGTG GCCAAGCGCA CGGCGGCAAG 

25 2201 TTTTCAAGCA CTGAGTTTCT ATTCCAAGAT CATAGACTTA CTAAAGAGAG 

2251 TGACAAATGC TTCCTTAATG TCTTCTATAC CAGAATGTAA ATATTTTTGT 

2301 GTTTTGTGTT AATTTGTTAG AATTCTAACA CACTATATAC TTCCAAGAAG 

2351 TATGTCAATG TCAATATTTT GTCAAT AAAG ATTTATCAAT ATGCCCTCAC 

2M01 AAAA AAA AAA AAAAAA 



30 



BLAST alert EMBL/EMBLNEli) 



35 EMBLNEU:AL133353 Human DNA sequence *** SEQUENCING IN PROGRESS 
*** from 

clone RPll-Mfl3Flln N = En Score = 31D&-, P = 5-3e-13M 

EMBL s HSEMX2 H-sapiens EMX2 mRNAn N = In Score = 2355. P = 5-le- 
40 101 

Medline entries 

45 ^2331b0b: 

Simeone Ai Gulisano Hi Acampora Di Stornaiuolo A-. Rambaldi M-» 
Boncinelli E • t 

Two vertebrate homeobox genes related to the Drosophila empty 
spiracles gene are expressed in the embryonic cerebral cortex. 
50 EMBO d 

m2 JulilK?) :2Sm-50 



55 



Peptide information for frame 1 
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ORF from 331 bp to IDflfc. bp; peptide length: 255 

Category: questionable ORF 

Classification: unset 

Prosite motifs: H0ME0B0X_1 (167-210) 

5 

1 MFflPAPKRCF TIESLVAKDS PLPASRSEDP IRPAALSYAN 
51 FHSAA A A AAG RGVYSNPDLV FAEAVSHPPN PAVPVHPVPP 
101 SSHSPHPLFA SflORDPSTFY PULIHRYRYL GHRFGGNDTS 
10 151 ARKPKRIRTA FSPSC3LLRLE HAFEKNHYVV GAERKflLAHS 

201 WFflNRRTKFK RflKLEEEGSD SflflKKKGTHH INRURI ATKC3 
251 DD 



15 Alert BLASTP hits for DKFZphamy2_14mlb -, frame 1 

PIR:I51737 homeotic protein emx2 - zebra fish; N = Score = 

753-. P = 

le-105 

20 

PIR:S22722 homeotic protein emx2 - human (fragment) i N = !•> Score 

• 7L.3-. P = l-3e-7S 

25 TREnBL:0LA132i403_l gene: "emx2"; product: "Emx2 protein", 
Oryzias 

latipes mRNA for Emx2 protein-, partial; N = 2-t Score = 513-. P = 
M-Se-72 

30 « 

>PIR:S22?22 homeotic protein emx2 - human (fragment) 
Length = 156 

HSPs: 

35 

Score = 7L.3 (114-5 bits)-, Expect = l-3e-7S-. P = L3e-7S 
Identities = 1MM/1MM ( IDQJc ) -> Positives = mH/imi (IDOX) 

(2uery: 101 

40 FASflfiRDPSTFYPWLIHRYRYLGHRFflGNDTSPESFLLHNALARKPKRIRTAFSPSflLLR Ibfl 

FASflflRDPSTFYPULIHRYRYLGHRFOGNDTSPESFLLHNALARKPKRIRTAFSPSOLLR 
Sbjct: 15 

FASflflRDPSTFYPWLIHRYRYLGHRFfiGNDTSPESFLLHNALARKPKRIRf AFSPSflLLR 7M 

45 

fiuery: lt.1 

LEHAFEKNHYVVGAERKc2LAHSLSLTET(2VKVlJF(3NRRTKFKRi2KLEEEGSl>S(2(2KKKGT 226 

LEHAFEKNHYVVGAERK(3LAHSLSLTET(3VICVUFi2NRRTKFKR(3KLEEEGSDS(2(3KKKGT 
50 Sbjct: 75 

LEHAFEKNHYVVGAERKflLAHSLSLTET«2VKVUF(3NRRTKFKR(3KLEEEGS]>S(2(3KKKGT 13M 

fluery: 221 HHINRWRIATKCJASPEEIDVTSDD 252 
HHINRWRIATKcSASPEEIDVTSDD 
55 Sbjct: 135 HHINRWRIATKfiASPEEIDVTSDD 156 

Pedant information for DKFZphamy2_lMmlb i frame 1 



SSPINPFLNG 
PHALAAHPLP 
PESFLLHNAL 
LSLTET(2VKV 
ASPEEIDVTS 
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Report for DKFZphamyE_mmlt . 1 
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15 
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25 



30 
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40 
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50 



55 



ELENGTHJ 

EMU 

EpID 

EH0M0L3 

113 

EFUNCAT3 
YHL027wIB 5e-05 
EFUNCATJ 



3ta 

10 - 51 

PIR:I£1737 homeotic protein emxE - zebra fish le 



30-10 nuclear organization 



ES- 



cerevisiaei 



ES- 



04-n other transcription activities 
cerevisiaei YI1LD27wJ 5e-05 

EFUNCAT3 D3-07 pheromone response! mating-type 

determination-i sex-specific proteins ES- 
cerevisiaei YCRDT7w2 5e-04 

EFUNCAT J DM.D5-01.QM transcriptional control ES • 

cerevisiaei YDLlObcJ 7e-0 L » 

EFUNCAT3 OLOM-OM regulation of phosphate utilization 

ES. cerevisiaei YDLlObcJ 7e-D4 

EFUNCAT J D1.D3.13 regulation of nucleotide metabolism 

ES- cerevisiaei YDLlDbcJ 7e-DM 

EBL0CKS3 PROOOM^D 

EBL0CKSJ PROD^D^H 

EBLOCKSID PRDQi*a7F 

EBLOCKSID PR0U71LG 

EBLOCKSJ BL0D035C 

EBLOCKSD BL000S7 'Homeobox' domain proteins 

EBLOCKSID PR0002bA 
EBLOCKSJ BL00032C 

EBL0CKS3 BL0D032B 'Homeobox' antennapedia-type protein 

ESCOPj dlau7bl 1 • 4 • 1 . 1 . b Pit-1 POU homeodomain Pit-1 

Pit-1 ERat CRattu Se-lb 

ESCOPJ dlyrna_ 1-4-1.1-2 mating type protein Al 

Homeodomain mat alpha 2e-15 

ESCOPJ dlenh_ 1-4-1. 1-1 engrailed Homeodomain 

E(Drosophila melanogaster 2e-13 



EPIRKLO 
EPIRKLO 
EPIRKIO 
EPIRKliU 
EPIRKliU 
EPIRKLO 
EPIRKIO 
EPIRKliU 
EPIRKliU 
EPIRKliU 
ESUPFAM 
ESUPFAHJ 
ESUPFAH3 
ESUPFAfO 
ESUPFA>0 
ESUPFAMJ 
ESUPFANJ 
ESUPFAPU 
EPROSITEJ 
EPFANJ 
EKIiU 



nucleus le-b7 
heart 3e-10 
DNA binding le-b? 
leukemia 3e-15 
alternative splicing le-10 
proto-oncogene 3e-15 
transcription factor be-11 
embryo Te-12 

transcription regulation le-b? 
homeobox le-b? . 
homeobox homology le-b7 
homeotic protein Hox A5 ?e-10 
homeotic protein Hox B3 3e-lD 
homeotic protein Hox B2 3e-ll 
homeotic protein Hox Bl 7e-ll 
unassigned homeobox proteins le-b? 
homeotic protein goosecoid 4e-lD 
homeotic protein Hox D4 Te-12 
H0I1E0B0X_1 1 
Homeobox domain 
Irregular 
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(EKIO 3D 

(EKIO L0U_C0riPLEXITY 25. '/. 

5 SE(2 

EKKRKKKKKNYPNPRL(2ILLEGFSPLSSGUARLV(2DSRDPRLCLSLS1_PPPFLFPFLSFL 
SEG 

•xxxxxxxx. - xxxxxxxxxxxxxxxxx 

IfjlA 

10 — 

SE<2 

SSFPSPHPHPHPK(3TSPC2FSSVLAAGSGRRR(2RAAVARSIilEPRAPAPRRSHF£}PAPKRCF 
SEG 

15 xxxxxxxxxxxx xxxxxxxxxxxxxxxx 

IfjlA 

SE<2 

20 TIESLVAKDSPLPASRSEDPIRPAALSYANSSPINPFLNGFHSAAAAAAGRGVYSNPDLV 
SEG 

• - • xxxxxx 

IfjlA 

25 * * ' 

SE<2 

FAEAVSHPPNPAVPVHPVPPPHALAAHPLPSSHSPHPLFASfiiSRDPSTFYPIJLIHRYRYL 
SEG 

- • - • xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

30 IfjlA 

SE(3 

GHRFUGNDTSPESFLLHNALARtCPKRIRTAFSPSflLLRLEHAFEKNHYVVGAERKflLAHS 
35 SEG 

IfjlA 

- - -CCCCCCCCCCHHHHHHHHHHHHHTTTTCHHHHHHHHHH 

40 SEC 

LSLTET(2VKVlJF(2NRRTKFKR(!KLEEEGSl>Sfi(3KKKGTHHINRWRI ATK(3ASPEEIDVTS 
SEG 

IfjlA * * " * 

45 HCCCHHHHHHHHHHHHHHHHHH 



SEfl DD 

SEG 

IfjlA 

50 

Prosite for DKFZphamyS_mmlb . 1 
PSDDD5? E=17->3B1 H0I1E0B0X_1 PDOCD005? 

55 

Pfam for DKFZphamyE_mmlb • 1 
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HMf1_NAHE Homeobox domain 

Hflfl 

5 *RRRpRTtFTre(3LdELEREFHf NrYPTRqRREELAi3mLNLTERi3VKIWF 

+ R RT+F + +12L++LE +F+ N + Y+ + + R 

+LA + + L+LTE + C3VK+UF 
Query 5bM 

PKRIRT AFSPSQLLRLEHAFEKNHY VVGAERK6LAHSLSLTET(2VKVWF 31S 

10 

HI1M GNRRHKUKRMH* 

(2NRR+K KR + 
Query 313 <3NRRTKFKRt2K 323 
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5 group: amygdala derived 

DKFZphamy2__lbem.p3 encodes a novel 32fl amino acid protein-i 
similar to carbonic anhydrase-related proteins- 



10 A similar cDNA encoding a 
identified in sheep- This 
which indicates that it is 
belongs to a protein famil 
anhydrase-r elated protein 

15 and Carll (mouse-i rat)- De 
in the active-site residue 
mammalsi a property indica 
has also been observed in 
CA-RP VIII and CA-RP X- 

20 No informative BLAST resul 
motif e- 



25 



30 



protein of the same length was 

protein shows a strong signal sequencei 

a secreted protein- The new protein 
y-i which was designated carbonic 
XI ( CA-RP XI)-, encoded by CA11 (human) 
spite potentially inactivating changes 
s-i CA-RP XI is evolving very slowly in 
tive of an important function-, which 
the two other "acatalyti c" CA isoforms-i 

tsi No predictive prosite-i pfam or SCOP 



The new protein can find application in studying the expression 
profile of amygdala-specific genes- 
similarity to carbonic anhydrase-related protein (Homo sapiens) 
ESTs ending at appr- ia00 have polyA-signal 
Sequenced by EMBL 

Locus: /map = rT 17qBMi 5-13cR from GATAmCOS" 

35 Insert length: 22b? bp 

Poly A stretch at pos- 2252-1 polyadeny lation signal at pos- 2231 



1 GGATGGAAAT 

40 51 GTCTGCATAT 

101 GGCATACAAG 
151 GGGGATTGGT 
201 TCGCCAGTCA 
251 ACCTCTTCGC 

45 301 ACACTGGAAG 

351 ATATCTGGAG 
H01 ACACTTTGGG 
^S1 AGGCCTTCTC 
501 ACGAATGTCA 

50 551 TATATTTATA 
tOl TCAACAGAGA 
L51 CTACAGGGGC 
701 CACTTATGAT 
751 GGATCATAAT 

55 flOl TTGCGCCTGC 

551 TGACAACTTC 
101 ATATCAACTT 
^51 AAGCTTCAGT 



AGTCTGGGAG 
CAGCTCAACA 
GAGGTGGTCC 
GAACTCAGCT 
ACATAGAGAC 
ATCAACACGG 
ACACGTATCC 
GGCCCATGAC 
AGTGAGGACA 
TGGGGAGGTG 
CAGAAGCTGC 
AAAGTTTCTG 
TACTATCACA 
TTAATATAGA 
GGGTCGATGA 
GAACAA ACCT 
TCAGCCAGAA 
AGGCCTGTCC 
CAGTTTACAG 
ATAGAGTAAA 



GTGCTTTTTC 
GAATTCACCA 
AGGGAAGCTT 
TGGAATCTTT 
CAGTCACATG 
GGGGCAGGAA 
CTTCGCCTGG 
ATACAGCCAC 
GCCAAGGGTC 
CAGCTCATCC 
AAAGAGTCCA 
ATTCATCAAA 
AGAATAACAT 
GGAACTATAT 
CTATCCCACC 
GTCTATATAA 
CCAGCCATCT 
AGCCACTCAA 
GGGAAGGACT 
TGAATGGCTC 



TTCTTCAAGC 
AAAATCCATG 
TGTTCCAGTT 
GCTCTGTGGG 
ATCTTCGACC 
GGTCAGTGGG 
ACAAGGAGCA 
CGGCTG6AGG 
GGAGCACCTC 
ACTATAACCA 
AATGGATTGG 
CCCATTTCTT 
ATAAAAATGA 
CCAGAGACCT 
CTGCTATGAG 
CCAGGATGCA 
CAGATCTTTC 
CAACCGCTGC 
GTCCAAACAA 
CTCAAGTAGG 



CAATTTCATC 
AAGGCTGGTG 
CCTTCTTTCT 
GAAACGGCAG 
CCTTTCTGAC 
ACCATGTACA 
CTTGGTCAAC 
AGATCCGACT 
CTCAATGGAC 
TGAGCTATAT 
TGGTAGTTTC 
AATCGAATGC 
TGCATATTTA 
CTAGTTTCAT 
ACAGCAAGTT 
GATGCATTCC 
TGAGCATGAG 
ATCCGCACCA 
CCGAGCCCAG 
GAACAA AGCC 
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1DD1 AAGAAGAATC CCACCTCAGT GAAATGCTAC AACTGTGAAT TGACGTAACC 

1051 TAGAATGTCC CCCTTCTTGC TTCTCTCTCC TTCTTTCCCC CAAGCCTCAT 

1101 TCATTCTTGG GATTGGCCCT TTCTTCATGA AAAGTGTCTG CAAAACCATG 

1151 GCAGAGGAAT ACATCTCTCA CACATACTCA CAAACACACA CACAAGCACT 

5 15D1 TGCACATACA TACAAACACA TGCA AACATA CCTACACACA CACACACTCT 

1251 TACAACCTCC ATCATGGGAA GTCAAGTTTC AGAAACAAAA GTCTCATTCA 

1301 TAAGAGGTCT TAGAAGAAAA TAACCAGTTA ACGTGATTTC AATTTTGATA 

1351 CCGTTTTCCT GAACTAAT AA ATCTACCCAA TGAGACTTTT CAGCCTTTGT 

mQl ACATACAAAA TTCTTCCA AA AGAGAGAGGA GAAAATACAG CTCTGATGGC 

10 m51 ATCAAACGGA CTTTGCATCA AGTAATTTCA GATAGTGTCC TAGGATCCTT 

1501 TGAGGGTGCT GGTAGCAGGT GAGCAGGACA AAGTTGACCA AGGACACTTA 

1551 TTTCTAGATT ATGATTCTTC TGTTTACTCA ACAATTTACA AAGAAAAAA A 

IbOl GGACAGACAT TGAAGAGCTA CACATTGTAT ATATATCACC ACAGACTATA 

1L51 AGGAAATGGA ATTATTTCCC TCTTTGTCAC ATATCTGTAG TAGGATTTGC 

15 1701 CAAGATCAGA AATGATCCAT TTGCTGTTTC TTGTTTTCCA AAGGTCATAC 

1751 ATTGTGTTTG GTT ATTGTTA CCAGCTCAAT AAATGT6TTT AACGAGTTAA 

lflOl TTTCATTTTT CTGGCTTTGG TCTGTTCTCC TTCCTTACAG GCTAAGCCCT 

IfiSl GGCTCCATGC AACTGCATTC TTTGATTTCA CTTGTTCCTT CATCTACATG 

1=101 TTTTGTTCAT TTGCAGCCAG TTTTTACTGA GTTTGTGGCA ATCAGGAATG 

20 n51 CATTTGCTAA GCAAGTATGA CTTTAATTCC ACTCCATGGC TCAATCATTC 

B001 ACATGAGGTG AGCTTCAGCC TGAGATAGCA GGCGACAGAC TTCTTGCGTT 

2051 TCAAAACTGC CATGCCCCCC TGTGATGCTC CCGTGAAGGA ATGCACTTTG 

2101 CCTTGTAAGT TCCTGGGAAA GGGGTATGTT TTCTCTCCAG GTGCAGCCAG 

2151 ATCTCACAAA GTACAAAACG AATGCCTTTC TTTTCTTGTT TATAATGGTC 

25 2201 ACTCACTGTG TTTGGTTACT GTCAAGAAAT CAATAAATGT GTTTAACAAG 
2251 TCAAAAAAAA AAAAAAA 



30 
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BLAST alert EMBL/ENBLNEb) 

EMBL: AFDb4B5M Homo sapiens map 17q2H^ 5-13cR from GATAH1C05 
repeat 

region-i complete sequence - V N = 2n Score = fl7aMn P = □ 

EMBLNEl): AC005Bfi3 Homo sapiens chromosome 17 clone RPll-TSflEll map 
17-. 

DORKING DRAFT SEQUENCE-, 2 ordered pieces. =i N = 3-» Score = b2t0-» P 
= D 

Medline entries 



^0=573^: 

45 Love joy DAi Hewett-Emmett Dn Porter CAi Cepoi D-i Sheffield A-i 
Vale UUi 

Tashian RE-n Evolutionari ly conserved i "acatalytic" carbonic 
anhydrase-related protein XI 

contains a sequence motif present in the neuropeptide sauvagine: 
50 the 

human 

CA-RP XI gene (CA11) is embedded between the secretor gene 

cluster and 

the 

55 DBP gene at 1^13-3. Genomics l^Tfi Dec 15^54 (3) :Mfl4-^ 
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Peptide information for frame 3 



5 ORF from □ bp to 1flt> bpi peptide length: 321 
Category: similarity to known protein 
Classification: unclassified 

1 MEIVUEVLFL LflANFIVCIS AfiflNSPKIHE GWlrtAYKEVVfl GSFVPVPSFU) 

10 SI CLVNSAUNLC SVGKRflSPVN IETSHMIFDP FLTPLRINTG GRKVSGTMYN 

101 TGRHVSLRLD KEHLVNISGG PMTYSHRLEE IRLHFGSEDS (JGSEHLLNGd 

151 AFSGEVfiLIH YNHELYTNVT EAAKSPNGLV VVSIFIKVSD SSNPFLNRHL 

SD1 NRDTITRITY KNDAYLLflGL NIEELYPETS SFITYDGSNT IPPCYETASU 

ESI IIMNKPVYIT RMflllHSLRLL S(2Ni2PS<3IFL SIISDNFRPVd PLNNRCIRTN 
15 301 INFSLfiGKDC PNNRA(2KL<2Y RVNEWLLK 



Alert BLASTP hits for DKFZphamy2_lbem -. frame 3 

20 PIR:JE037S carbonic anhydrase-related protein - humane N = l-i 
Score = 

137-. P = 4-be-m 

SUISSNEU:CAHB_SHEEP CARBONIC ANHYDRASE-RELATED PROTEIN 2 
25 PRECURSOR 

(CARP 2) (CA-RP II) (CA-XD.^ N = 1, Score = 135-. P = 7-Se-m 



>PIR:JE037S carbonic anhydrase-related protein - human 
30 Length = 328 

HSPs: 

Score = 137 (lMO-b bits)-. Expect = M-be-IM-. P = H-be-IH 
35 Identities = 11.1/237 (555:)-. Positives = 223/237 (77*) 

Query: 3D 

EGldUAYKEVVflGSFVPVPSFblGLVNSAWNLCSVGKRflSPVNIETSHMIFDPFLTPLRINT 61 
E WU+YK+ +<2G+FVP P FWGLVN+Alil+LC+VGKR(2SPV+ + E 
40 +++DPFL PLR++T 
Sbjct: 32 

EDWbJSYKDNLQGNFVPGPPFUGLVNA AWSLCAVGKRQSPVDVEVKRVLYDPFLPPLRLST 11 
fluery: ID 

45 GGRKVSGTCIYNTGRHVSLRLDKEHLVNISGGPIITYSHRLEEIRLHFGSEDSOGSEHLLNG 14=) 

GG K+ GT+YNTGRHVS +VN+SGGP+ YSHRL E+RL FG+ D 

GSEH +N 
Sbjct: 12 

GGEKLRGTLYNTGRHVSFLPAPRPVVNVSGGPLL YSHRL SELRLLFGARDGAGSEH(3INH 151 

50 

fiuery: ISO 

dAFSGEVflLIHYNHELYTNVTEAAKSPNGLVVVSIFIKVSDSSNPFLNRIILNRDTITRIT 201 

a FS EVflLIH+N ELY N + A++ PNGL ++S+F+ V+ 
+SNPFL+R+LNRDTITRI+ 
55 Sbjct: 152 

<2GFSAEV<2LIHFN(2ELYGNFSAASR6PNGLAILSLFVNVASTSNPFLSRLLNRDTITRIS 211 
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Query* 210 

YKNDAYLLr3GLNIEELYPETSSFITYDGSnTIPPCYETASUIiriNKPVYITRri(3riHSLRL 2b^ 

YKNDAY L<2 L + + E L + PE+ FITY GS + + PPC ET + IT 

+GHHSLRL 
5 Sbjct: 212 

YKNI>AYFL(3I>LSLELLFPESFGFITY(36SLSTPPCSETVTbJILIDRALNITSL(3I1HSLRL 271 

tfuery: 270 LSl3NC2PS(2IFLSP1SPNFRPVt3PLNNRCIRTNII\IFSL(3GK]>C--PNNR 314 
LS(3N PStflF S+S N RP+<2PL +R +R N + . + C PN R 
10 Sbjct: 272 LS(3NPPS(2IF<3SLSGNSRPLt2PLAHRALRGNRDPRHPERRCRGPNYR 31fl 

Pedant information f or DKFZphamy2_lfc>em frame 3 

15 Report for DKFZphamy2_lbem - 3 

ELENGTHJ 3Bfi 

EMIO 37St3-l c l 

20 EpI3 fl-25 

EHOMOLU PIR:JED37S carbonic anhydrase-related protein - 
human le-101 

EBL0CKSJ DNOllOTB 

EBL0CKSH BLODlbBF 

25 EBLOCKSD BLDD1L2E 

EBLOCKSH BLDDlbBD 

EBLOCKSID BLDOlbBC Eukaryot ic-type carbonic anhydrases 
proteins 

EBLOCKSID BLDDlbBA Eukaryot ic-type carbonic anhydrases 
30 proteins 

ESCOPl dlznca_ 2-5b-l-l-3 Carbonic anhydrase Ehuman 
(Homo sapiens Ie-1D3 

ESC0PJ d2cba B-St-l-l-B Carbonic anhydrase Ehuman 

(Homo sapiens Te-T? 

35 EEO M-2-1.1 Carbonate dehydratase le-3b 

EEC3 3-1-3- 4fi Protein-tyrosine-phosphatase 2e-2D 

EPIRKIO blocked amino end fie-2^ 

EPIRKIO carbon-oxygen lyase le-3b 

EPIRKIO zinc le-3b 

40 EPIRKIO polymorphism 2e-20 

EPIRKIO hydro-lyase le-3b 

EPIRKIO transmembrane protein 3e-23 

EPIRKIO tyrosine-specif ic phosphatase 2e-ED 

EPIRKIO brain be-lb 

45 EPIRKIO acetylated amino end le-3b 

EPIRKIO phosphatidylinositol linkage 2e-lT ■ 

EPIRKIO receptor 2e-BD 

EPIRKIO liver 3e-2T 

EPIRKIO phosphoprotein 2e-2D 

50 EPIRKIO saliva Be-21 

EPIRKIO . glycoprotein 2e-22 

EPIRKIO mitochondrion le-35 

EPIRKIO monomer 3e-32 

EPIRKIO alternative splicing be-lb 

55 EPIRKIO lipoprotein Be-n 

. EPIRKIO pyroglutamic acid Be-21 

EPIRKIO metalloprotein be-35 

EPIRKIO muscle Me-31 
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EPIRKtiO membrane protein Ee-n 

EPIRKUO phosphoric monoester hydrolase Ee-£0 

[PIRKIO homodimer 3e-E3 

CSUPFAHi fibronectin type III repeat homology Ee-EO 

5 ESUPFAPU carbonic anhydrase homology le-3b 

ESUPFAMID protein-tyrosine-phosphatase-i receptor type zeta 
be-lb 

ESUPFArO carbonate dehydratase le-3L> 

ESUPFAPO protein-tyrosine-phosphatase-i receptor type gamma 
10 Ee-ED 

ESUPFAMID protein-tyrosine-phosphatase homology Ee-ED 

ESUPFAFO leukocyte common antigen cytosolic domain 

homology Se-EO 

CPFAM3 Eukaryotic-type carbonic anhydrases 

15 EKkO All_Beta 

EKkO 3D 

EKIO SIGNAL_PEPTIDE EE 
20 SE* 

nEIVUEVLFLL(3ANFIVCISA(3(2NSPKIHEGUlilAYKEVVt3GSFVPVPSFlJGLVNSAb)NLC 
lugc- 

25 sza 

SVGKR(3SPVNIETSHniF1>PFLTPLRINTGGRKVSGTnYNTGRHVSLRLDKEHLVNISGG 
luge- . -TTTTCCCEETTTTTEETTTTCEEEEETT- 
TTCEEEEEETTTTEEEEECTTTTTEEEEE 

30 sza 

PI1TYSHRLEEIRLHFGSEDS<2GSEHLLNG(3AFSGEV(3LIHYNHELYTNVTEAAKSPNGLV 
luge- TTCCCEEEEEEEEEETTTTTTCTTTEETTBCCCEEEEEEEEEGG- 
GTTHHHHHCTTTTEE 

35 SE<3 

VVSIFIKVSDSSNPFLNRF1LNRDTITRITYKNl>AYLL(3GLNIEELYPETSSFITYDGSnT 
luge- EEEEEEEEC-CCCGGGHHHH-- 
HHGGGCCTTTEEEETTTTCGGGGCCCCCCEEEEEECCC 

40 SE(3 

IPPCYETASUIinNKPVYITRn^nHSLRLLSi3N(3PS(3IFLSriSI>NFRPV(3PLNNRCIRTN 
lugc- 

TTTTCCCEEEEEECCCEEECHHHHHHHHCCBCCTTTTCCCBTTTTCCCCCCTTTTCCEEC 

45 SE(2 INFSLflGOCPNNRAflKLGYRVNEWLLK 
luge- 



50 



55 



(No Prosite data available for DKFZphamyE_lbelM - 3 ) 

Pfam for DKFZphamyE_lbelM - 3 

HMM„NAME Eukaryot ic-type carbonic anhydrases 

*WCYgeHWGPEHH UHkhYPIAW GDR<2SPINIt2UkearYDPS 
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ti) Y E + QJ+++ + . + „ G R(3SP+NI ++ 

+DP 

duery 33 

UAYKEVV(2GSFVPVPSFUGLVNSAUNLCSVGKR(2SPVNIETSHniFDPF fil 

5 

HMM 

LKPblrv • SYYpaUCrEWelUNNGHSFgVeFDDSMDnSVLsGGPLPgHPYR 

L P+R + + + ++++ ++ N + G + + +D +SGGP++ 

++R 

10 <2uery fi2 LTPLRINTGGRKVSG — TMYNTGRHVSLRLDK- 

EHLVNISGGPNTY-SHR 127 

HMfl 

Lkt3FHFHliJGGASsNDLJGSEHTVI>GmkYPnELHLVHli)NStKYnNYdEA(3dq 
15 L + ++H G S++ + GSEH ++G +++ E+ L+H+N +Y N+ 

EA + + 

duery 12fi LEEIRLHFG-- 

SEDS(2GSEHLLNG(2AFSGEVr2LIHYNHELYTNVTEAAKS 175 
20 HI1H 

PDGLAVIGVFMKVGNYqENPyLtSKVv - - DALdnlKYKGKratflTNFDPsC 

P+GL V+ + F + KV MP L + + + J> + I YK + 

+++ + + 

fluery 17b PNGLVVVSIFIKVS- 

25 DSSNPFLNRHLNRDTITRITYICNDAYLLflGLNIEE 22M 

HMfl 

LLPpPnCRDYUTYPGSLTTPPChECVTUIVCKEPIsISsE(3niilKFRsLLF 

L P+ + TY GS+T+PPC+E UI+ P+ I + an +R 

30 L 

<2uery 225 LYPE-- 

TSSFITYDGSI1TIPPCYETASliIIiriNKPVYITRI1(3l1HSLRLLS(3 272 

HMM NhEGEeeVpUVDNURf^PflPLKhRvVRASF* 
35 N +M DN + RP (3PL + + R +R + 

tfuery 273 N<2PS<2IFLSMSDNFRPV(3PLNNRCIRTNI 3D1 
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5 group: nucleic acid management 

DKFZphamy2_lcl2 encodes a novel M22 amino acid protein with 
partial identity to I-kappa-B-related protein and to BRCA1- 

10 I-kappa-B-related protein interacts with transcription factors 

and BRCA1 has a function in DNA damage response- I-kappa-B-alpha 
mutations contribute to const itutive NF-kappaB activity in 
cultured and primary HRS (Hodgkin/Reed-Sternberg) cells and are 
therefore involved in the pathogenesis of Hodgkin's disease (HD) 

15 patients- 

The new protein can find application in modulating DNA repair and 
mutagenesis and also in expression profiling in HD related 
syndroms. 



20 
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similarity to I-kappa-B-related protein 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: lbM5 bp 

Poly A stretch at pos. Ib2bn polyadenylat ion signal at pos- lb05 



1 GGATTTTCCT TGGTCTTA AG ATGGGT AGA A ATGTGATGCG ACACATGTCT 

SI GATGACTTAG GAAGTTATGT TTCTCTTTCG TGTGATGACT TTTCTTCACA 

101 GGAATTAGAG ATTTTCATTT GCTCCTTTTC CTCCTCCTGG CTTCAAATGT 

35 151 TTGTTGCAGA GGCAGTCTTT AAAAAGTTGT GTCTACAGAG CTCTGGCAGT 

201 GTTTCTTCTG AGCCACTCTC TCTTCAGAAA ATGGTATATT CCTATTTACC 

251 AGCCTTGGGG AAAACTGGTG TGCTTGGGTC TGGAAAGATT CAGGTGTCA A 

301 AGAAAATAGG ACAGCGGCCT TGTTTTGACT CTCAGAGAAC CTTAGTAATG 

351 CTGAATGGTA CTAAACAAAA ACAAGTCGAA GGGCTGCCAG AGTTACTAGA 

40 HOI CCTGAACCTT GCTAAATGTT CCTCATCATT AAAAAAATTG AAAAAGAAGT 

M51 CAGAAGGAGA ATTGTCATGT TCCAAGGAGA ATTGCCCCTC TGTAGTTAAA 

501 AAGATGAATT TTCACAAGAC TAATCTAAAA GGAGAAACAG CCCTGCATAG 

551 AGCTTGCATA AATAACCAAG TGGAGA AATT GATTCTTCTT CTCTCTTTGC 

bDl CAGGAATAGA CATCAATGTT AAAGACAATG CTGGCTGGAC GCCTTTGCAT 

45 b51 GAAGCCTGTA ACTATGGCAA CACAGTGTGT GTCCAGGAAA TTTTGCAACG 

701 TTGTCCAGAG GTAGATCTGC TCACTCAAGT GGACGGGGTG ACTCCTTTGC 

751 ATGATGCACT GTCAAACGGA CATGTAGAA A TTGGCAAGCT GCTACTACAG 

fiDl CATGGGGGCC CAGTGCTTTT ACAACAGAGG AATGCTAAGG GAGAATTGCC 

fiSl CTTGGATTAT GTGGTTTCAC CTCAAATCAA AGAAGAACTG TTTGCTATTA 

50 TD1 CAAAAATAGA AGATACAGTG GAGAACTTTC ATGCACAAGC AGAGAAACAT 

^51 TTTCATTACC AGCAACTTGA ATTTGGCTCC TTTTTACTTA GTAGGATGTT 

1DD1 GCTAAATTTT TGTTCAATTT TTGATTTATC TTCAGAGTTC ATTTTAGCTT 

1051 CCAAAGGGTT AACTCATCTA AATGAACTGC TTATGGCTTG TAAAAGTCAT 

1101 AAAGAAACCA CCAGTGTTCA TACTGACTGG TTACTGGATC TTTATGCTGG 

55 1151 AAATATAAAG ACATTGCAGA AACTCCCACA CATTCTTAAG GAACTGCCTG 

12D1 AGAATTTGAA AGTGTGTCCT GGGGTACACA CTGAGGCCTT GATGATAACA 

1251 TTGGAAATGA TGTGTCGGTC AGTCATGGAG TTTTCATGAT GATGCTAGAA 

1301 AGTATGGATT GACTTTCTAA ATCTGTTCAG TTTGCATTGG TACTTACTGT 
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1351 GGACTTCATA GCTTACTGAC AGATAGTA AT TTGATTTATT TATTGACAGA 

14D1 CTTTGCAGCC TTGCTAAATT TTA A AAGCAT TTTTAAAAAA ACTTCTACAA 

mSl A ACTCTAGTA TGGGCTTCTG ACTTTTTCCA GGGTGTAGAA TTTGACTCAA 

1SD1 A AGT AAAAAT AATTTTGTTT TAGTATATTC TACTTTCATT AATGTTTTTT 

1551 TGTTCTGAAA GTGATATTAT ATTGTACATG TAAAATTAAT TTAAATATTT 

IbOl TTTCAAATAA AAATGTAATG TCCTGTAAAA AAAAAAAAAA AAAAA 

BLAST Results 

No BLAST result 

15 Medline entries 

No Hedline entry 

20 

Peptide information for frame 3 

25 ORF from El bp to lEflb bp^ peptide length: MEE 
Category: similarity to known protein 
Classification: Cell signaling/communication 

1 MGRN VMRHHS DDLGSYVSLS CDDFSSGELE IFICSFSSSU LflNFVAEAVF 

30 51 KKLCL<2SSGS VSSEPLSLflK MVYSYLPALG KTGVLGSGKI CVSKKIGflRP 

1D1 CFDStfRTLLM LNGTKr3Kt2VE GLPELLDLNL AKCSSSLKKL KKKSEGELSC 

151 SKENCPSVVK KMNFHKTNLK GET ALHR ACI NNflVEKLILL LSLPGIDINV 

EDI KDNAGWTPLH EACNYGNTVC Vt3EILi3RCPE VDLLTtfVDGV TPLHDALSNG 

E51 HVEIGKLLLG HGGPVLL(2<2R NAKGELPLDY VVSPdIKEEL FAITKIEDTV 

35 3D1 ENFHAflAEKH FHY(3t3LEFGS FLLSRNLLNF CSIFDLSSEF ILASKGLTHL 

351 NELLF1ACKSH KETTSVHTDLJ LLDLYAGNIK TLdKLPHILK ELPENLKVCP 

^□l GVHTEALMIT LEMDCRSVIIE FS 

40 

BLASTP hits 

No BLASTP hits available 

45 Alert BLASTP hits for DKFZphamy2_JLclE 1 frame 3 

PIR : ASbME^ I-kappa-B-related protein - humani N = li Score = E**2-i 
P = 

M-Le-lfi 

50 

TREI1BLNElil:AF03fiDME_l gene: "BARD1" =i product: "BRCAl-associated 
RING 

domain protein n =; Homo sapiens BRCAl-associated RING domain 
protein 

55 (BARD1) gene-i exons !□«■ 11 and complete cdso N = l-i Score = E3bi 
P = 

L-^e-17 
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>PIR:A5bMET I-kappa-B-related protein - human 
Length = Mfll 

5 HSPs: 

Score = 2M2 <3b-3 bits)-, Expect = M-be-lfl-. P = M-be-16 
Identities = 52/llfi (MM*)-. Positives = 71/llfi <b0*) 

10 <2uery: 15b 

PSVVKKI1NFHKTNLKGETALHRACINN(2VEKLILLLSLPGIDINVK:DNAGIiITPLHEACNY 215 
P K + + + N GET LHRACI (2 + ++ L+ G + N +J> 

GWTPLHEACNY 

Sbjct: 35M PGAAKGSKWNRRNDNGETLLHRACIEG<2LRRV(2DLVR- 
15 <2GHPLNPRDYCGliJTPLHEACNY M12 

<3uery: 21b GNTVCVt3EIL(3RCPEVDLL-- 
T(2Vl>GVTPLH]>ALSNGHVEIGKLLLr3HGGPVLL(3(3RNA 272 

G+ V+ +L VD +G+TPLHDAL+ GH E+ +LLL+ G V 

20 L+ R A 

Sbjct: M13 

GHLEIVRFLLPHG AAVDDPGG(2GCEGITPLH])ALNCGHFEVAELLLERGASVTLRTRKA M71 
25 Pedant information for DKFZphamy2„lcl2 t frame 3 

Report for DKFZphamy2_lcl2 • 3 
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ELENGTH3J M22 

EMbO M7D71-1A 

EpIJ b-57 

EHOMOLJ PIR:A5bM2^ I-kappa-B-related protein - human 3e-n 



EFUNCAT3 unclassified proteins ES. cerevisiae-i YIL112w3 

3e-ll 

EFUNCATJ Ob-13.01 cytoplasmic degradation ES- cerevisiae-> 

YGR232w3 Me-Ob 

40 EFUNCATJ 3D. ID nuclear organization ES- cerevisiae-i YIRD33w3 
2e-0M 

EFUNCAT3 DM-05.D1-07 chromatin modification ES. cerevisiae-i 

YIRD33wJ 2e-0M 

ESC0PJ dlawcb_ 1.^1.3-1-2 GA binding protein (GABP) alpha 
45 GA bindini be-2M 

EECJ 3-1-3-53 Hyosin-light-chain-phosphatase Te-Db 

EPIRKbO phosphotransferase 3e-0? 

EPIRKbO tandem repeat Te-Db 

EPIRKbO transmembrane protein 7e-10 

50 EPIRKhD serine/threonine-specif ic protein kinase 3e-07 

EPIRKliD phosphoprotein 3e-lD 

EPIRKbO integrin binding 3e-07 

EPIRKliD alternative splicing 3e-ll 

EPIRKliD peripheral membrane protein 26-0*5 

55 EPIRKliD transcription regulation 3e-0b 

EPIRKliD phosphoric monoester hydrolase Te-Ob 

EPIRKliD cytoskeleton Me- ID 

EPIRKliD smooth muscle Te-Ob 
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ESUPFAMJ ankyrin 3e-ll 

ESUPFAMID ankyrin repeat homology 3e-ll 

ISUPFAMJ unassigned ankyrin repeat proteins 7e-lD 
EPFAPD Ank repeat 

5 EKIO Irregular 

EKLD 3D 

CKliO LOliMZOIIPLEXITY fi.53 X 

10 SEt2 nGRNVriRHI1SI>DLGSYVSLSCDDFSS(3ELEIFICSFSSSlJL(3nFVAEAVFKKLCL(aSSGS 

SEG xxxxxx 

lawcB 

15 SE(3 VSSEPLSL(3Kf1VYSYLPALGKTGVLGSGKI(3VSKKIG(3RPCFPS(3RTLLnLNGTK(2K(2VE 

SEG xxxxxxxx • • . 

lawcB 

20 SE<2 GLPELLPLNLAKCSSSLKKLKKKSEGELSCSKENCPS VVKKMNFHKTNLKGETALHRACI 

SEG xxxxxxxxxxxxxxxxxxxxxx 

lawcB 

25 SE<2 NN(2VEKLILLLSLPGIDINVKDNAGliJTPLHEACNYGNTVCV(3EILr3RCPEVDLLT(2Vl>GV 

SEG 

lawcB 

TTTTCCHHHHHHHHCCHHHHHHHHHCCCTTTTCTTTTC- 

30 SEtf TPLHDALSNGHVEIGKLLL(3HGGPVLL(3t2RNAKGELPLDYVVSP(3IKEELFAITKIEl>TV 

SEG - - 

lawcB 

CHHHHHHHHTTHHHHHHHHHCCCTT- . . 

35 SE<2 - ENFHA(3AEKHFHYr3(3LEFGSFLLSRnLLNFCSIFI>LSSEFILASKGLTHLNELLMACICSH 

SEG - 

lawcB 

40 SEt3 KETTSVHTDli)LL3>LYAGNIKTLr3KLPHILKELPENLKVCPGVHTEALMITLEIU1CRSVI1E 

SEG ... 

lawcB 



45 SE<2 FS 
SEG - - 
lawcB 



50 (No Prosite data available for DKFZphamyE_lcl2 . 3 > 

Pfam for DKFZphamy2_lcl>2 - 3 

55 

HMM_NAME Ank repeat 

HUH *GyTPLHIAARyNNvEMVrlLL(2H-GADIN* 
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G+T+LH A+++N+VE LLL + G DIN 
fluery 171 GETALHRACINNdVEKLILLLSLPGIDIN M«J 

34 • Mfi (bits) f: 90S t: 232 Target: dkf zphamy2_lcl2 - 3 
5 similarity to I-kappa-B-related protein 
Alignment to Will consensus: 
Guery *GyTPLHIAARyNNvEMVr 1LL<3HGADIN* 

G+TPLH A+ Y + N+ +V + L(3+ + + + 
dkf zphamy2 EDS GWTPLHEACNYGNTVCVdEILdRCPEVD 232 

10 

duery f: 23^ t: 2bb Target: dkf zphamy2_lcl2 . 3 

similarity to I-kappa-B-related protein 

Alignment to HflM consensus: 
HUH *GyTPLHIAARyNNvEMVrlLL(3HGAl)IN* 
15 G TPLH A + ++VE+ +LLLGHG + 

Query 23T GVTPLHJ)ALSNGHVEIGKLLL<2HGGPVL 2bb 

DKFZphamy2_lil 



20 

group: nucleic acid management 

DKFZphamy2_lil encodes a novel b2 c i amino acidprotein with 
25 similarity to the murine hemin-sensiti ve initiation factor 2- 

The hemin-sensiti ve initiation factor 2 is expressed 
predominantly in liveri spleen-i colon and uterus and contains 2 
protein kinase motifs- The mouse homologue inhibits protein 

30 synthesis in stress conditions by phosphorylation of eif-2-alpha. 
Four different eIF2alpha kinases have been identified in 
mammalian cells-i the heme-regulated inhibitor (HRD-. the 
interferon-inducible RNA-dependent kinase (PKR)n the endoplasmic 
reticulum-resident kinase (PERK) and HGCN2- The new protein 

35 represents a new member of this family 

The new protein can find application in modulating/blocking of 
translation. 

40 

similarity to hemin-sensiti ve initiation factor 2 (flus musculus)i 
complete cds • alpha kinase 

complete cds- 
45 probably complete in genomic clone DJ0DM2MD2 

Sequenced by MediGenomix 

Locus: /map = fT 3?-2 cR from top of Chr? linkage group" 

50 

Insert length: 26b3 bp 

Poly A stretch at pos- 2fl44-i polyadenylation signal at pos- 2fl24 



55 1 GCAGTGCTGG GCTGGCCGGC GGGCTGGGCT GCGGCCCGCG CGCGGCCGGC 

51 GATGCAGGGG GGCAACTCCG GGGTCCGCAA GCGCGAAGAG GAGGGCGACG 

1D1 GGGCTGGGGC TGTGGCTGCG CCGCCGGCCA TCGACTTTCC CGCCGAGGGC 

151 CCGGACCCCG AATATGACGA ATCTGATGTT CCAGCAGAAA TCCAGGTGTT 
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201 AAAA6AACCC CTACAACAGC 
251 TCTTGCTGGT TTCTTTGCTG 
301 CCACTTCGTT CAAG ACAGGT 
351 A ATGGGGCTG CTGTCTTCTT 
5 1401 GACTACATCA CAACAGAGCT 

M51 AGAGTTCGTC AGGATCCTTG 
501 ATCAAGGGAA GTAGCCTTGG 
551 TTGAAGAACT TGCCATCTTA 
bDl GTCAGGAATA AATTAGATGG 
10 b51 TAAGGGTGCA ACTAA AACAG 

701 TGCTGGCAGG TCTTCAGCAC 
751 ATAGAACATG TTCATGTGAT 
fiOl GCCATCTCTG GAAGTGCTCT 
fi51 GTGTTAAAAA TGATGAAAGT 
15 ^01 ACCCCAGAAA AAGAAAAACG 

^51 TAACAAGTCG GTGAAGTACA 
1001 AACTTGAGTC GACCCTGGAG 
1051 GCCAGTTCAA TTGTGGAACA 
1101 AGAGGAGAGT TTCACATCCA 
20 1151 TTTTGGGTCA GAC AGAGGC A 
1201 CAGCTGTGTG AGCTCTCGCT 
1251 GGGCCGGGAG TATGTGGACG 
1301 TTGCAACAAA AATTTTTCAA 
1351 AACATGGGAA TTGTGCACCG 
25 mOl TGGCCCTGAT CAGCAAGTAA 
1M51 ACATCCTACA GAAGAACACA 
1501 CCAACACATA CGTCGAGAGT 
1551 GTTGGAAGGA TCTGAGTATG 
IbOl TGGTCCTGCT AGAGCTCTTT 
30 lfc,51 GAAGTTCTAA CAGGTTTAAG 
1701 AAGGTGTCCA GTGCAAGCCA 
1751 CATCGCAGAG ACCATCTGCC 
1501 AATTCTGGAA ATGTTAACCT 
1A51 AAAAGAAATT GCAGAACTAA 
35 . V1Q1 A A GGGGTG AG _.GGATGACGG A 
1^51 TAACTTTTAA GGTAGTTAAC 
2001 TATAGTTGGT ACAATGCTTC 
2051 TTAAAGATGT CAGAGTGGCC 
2101 CAAGCTCCTT TTCCTGAATT 
40 2151 CTCTGAAACT A AAA ACTTGG 
2201 CTCTGTTGAC CCTCTGTCTG 
2251 TTGATGTGTG CTCCCTGCCC 
2301 ACTCACTGGA CTGTTCCAGT 
2351 CACACTGCAG CTGA AGTTCA 
45 2M01 CTTGGCTCCA AACCTTCTGC 
2i*51 CAGTGTACAG TGTGCAACTT 
2501 AGTTTAGCGT CATTCCAAAA 
2551 CCCCAGATGA TCTCCAGGCA 
2b01 ACTGTTCCCC GGTACTTCCT 
50 2b51 ATCTGTACAT CCCTCAGAGG 
2701 CAGAAAGGGT CTGCCATGGA 
2751 ATTTTGCTGA ATTTTAAATA 
2601 TATTGACACT TGACAACTGG 
2S51 AAAAAAAAAA AAA 

55 



CAACCTTCCC TTTTGCAGTT GCAAACCAAC 
GAGCACTTGA GCCACGTGCA TGAACCAAAC 
GTTTAAGCTA CTTTGCCAGA CGTTTATCAA 
TCACTTGTAG TGACGAGTTT AGCTCATTGA 
ATTACTCACT TAATGAGGTC TGCTAAAGAG 
TGAGGAT ATT TCTCGTATCC AGAAAATCAG 
AAGCACAAAC TTCACGTTAC TTAAATGAAT 
GGAAAAGGTG GATACGGAAG AGTATACAAG 
TCAGTATTAT GGAATAAAAA AAATCCTGAT 
TTTGCATGAA GGTCCTACGG GAAGTGAAGG 
CCCAATATTG TTGGCTATCA CACCGCGTGG 
TCAGCCACGA GACAGAGCTG CCATTGAGTT 
CCGACCAGGA AGAGGACAGA GAGCAATGTG 
AGCAGCTCAT CCATTATCTT TGCTGAGCCC 
CTTTGGAGAA TCTGACACTG AAAATCAGAA 
CCACCAATTT AGTCATAAGA GAATCTGGTG 
CTCCAGGAAA ATGGCTTGGC TGGTTTGTCT 
GCAGCTGCCA CTCAGGCGTA ATTCCCACCT 
CCGAAGAATC TTCCGAAGAA AATGTCAACT 
CAGTACCACC TGATGCTGCA CATCCAGATG 
GTGGGATTGG ATAGTCGAGA GAAACAAGCG 
AGTCTGCCTG TCCTTATGTT ATGGCCAATG 
GAATTGGTAG AAGGTGTGTT TTACATACAT 
AGATCTGAAG CCAAGAAATA TTTTTCTTCA 
AAATAGGAGA CTTTGGTCTG GCCTGCACAG 
GACTGGACCA ACAGAAACGG GAAGAGAACA 
GGGTACTTGT CTGTAGGCTT CACCCGAACA 
ATGCCAAGTC AGATATGTAC AGCTTGGGTG 
CAGCCGTTTG GAACAGAAAT GGAGCGAGCA 
AACTGGTCAG TTGCCGGAAT CCCTCCGTAA 
AGTATATCCA GCACTTAACG AGAAGGAACT 
ATTCAGCTGC TGCAGAGTGA ACTTTTCCA A 
CACCCTACAG ATGAAGATAA TAGAGCAAGA 
AGAAGCAGCT AAACCTCCTT TCTCAAGACA 
AAGGATGGGG GCGTGGGATG AAAGTGGACT 
TGGAATGTAA ATTTTTAATC TTTATTAGGG 
GTTGTATTTA GTAAGCCTTT ACAAGACTTG 
CAAGCTGCCG TTCCTTCCCT TCCTGCCCCA 
TCCTACCTAA ATATTAACCA TATGCCTAGT 
ACCTCATCCT CAATTATTTT CTCGTTTCAA 
GTCTTCCTCT AGAAGGTTCT ACCGCAGAAA 
TCGTCACTGC CCAAGCCCGG GCCTGCACAT 
TTTGACAGCT GCCAGTCTTC CTGCCCCTTT 
TTACCTGAAG GACGCCTCAT CATTTCATTC 
TGCCTCTAAG ATAAAAGCTC AACTTCTTAA 
CCAACCTTTT TATCTGTTCT CTCCACCTTC 
CCACACCCTT GCAAAGCTTT GTACTCCGCA 
GCTCAGATCT CTTTCCTGCC TTTGCCCTGC 
CCTTTATTGT AGCACTCAGC TCCCCAGCCA 
CAGCGATCTG ATGAATTGGT TTTTGAATCC 
GTTGGCAGTC ATCACGGTAG ATGGCGTATG 
AAATGAAAAC CATAAATTAC ATGATGCTTT 
CCTAAATAAA AAGACTCTGA CTCCAAAAAA 



BLAST Results 
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Entry AFOEflfiOfl from database EMBL : 

llus musculus hemin-sensi t ive initiation factor E alpha kinase 
mRNAv 
5 complete cds- 

Score ■ bbflfi-. P = £.7e-E^b-, identities = n££/E53M 

Entry ACCOSTS from database EI1BL : 

Homo sapiens clone D JDD4EriDEn DORKING DRAFT SEQUENCE-, 13 
10 unordered 
pieces . 

Score = 511b-, P = D . De + QD-i identities = lEHCl/imfl 



15 



25 



30 



Medline entries 



ncmEDDT: 

20 Berlanga J • J - Herrero S--» de Haro C»t Characterization of the 
hemin-sensi tive eukaryotic initiation factor Ealpha kinase from 
mouse 

nonerythroid cells=i J • Biol- Chem. E73(4fi) :3E34D-3E3Mt Cl^fl) . 



Peptide information for frame 1 



ORF from 5E bp to 1T3A bp^ peptide length: bET 
Category: similarity to known protein 
Classification: Protein management 
Prosite motifs: PROTEIN_KINASE_ATP C-l?3-11b> 
35 PROTEIN_KINASE_ATP (173-117) 
PROTEIN_KINASE_ST (437-MMT) 

1 flflGGNSGVRK REEEGDGAGA VAAPPAIDFP AEGPDPEYDE SDVPAEISVL 

40 51 KEPLddPTFP FAVAN0LLLV SLLEHLSHVH EPNPLRSRflV FKLLCflTFIK 

101 I1GLLSSFTCS DEFSSLRLHH NRAITHLNRS AKERVRdDPC EDISRK2KIR 

151 SREVALEAflT SRYLNEFEEL AILGKGGYGR VYKVRNKLDG GYYAIKKILI 

EDI KGATKTVCMK VLREVKVLAG LdHPNIVGYH TAUIEHVHVI dPRDRAAIEL 

E51 PSLEVLSDtfE EDRE<2CGVKN DESSSSSIIF AEPTPEKEKR FGESDTENGN 

45 301 NKSVKYTTNL VIRESGELES TLELflENGLA GLSASSIVEd flLPLRRNSHL 

351 EESFTSTEES SEENVNFLGfl TEA<2YHLMLH IflMflLCELSL UDUIVERNKR 

401 GREYVDESAC PYVMANVATK IFdELVEGVF YIHNHGIVHR DLKPRNIFLH 

M51 GPDflflVKIGD FGLACTDILO KNTDWTNRNG KRTPTHTSRV GTCLYASPE<2 

5D1 LEGSEYDAKS DNYSLGVVLL ELFtfPFGTEfl ERAEVLTGLR TGGLPESLRK 

50 551 RCPV(3AKYI(3 HLTRRNSSflR PSAI<2LL(3SE LFtfNSGNVNL TL<2I1KIIE:<3E 

bOl KEIAELKKtSL NLLSfiDKGVR DDGKDGGVG 



55 BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphamyE_lil -. frame 1 

No Alert BLASTP hits found 

Pedant information for DKFZphamyE_liln frame 1 

Report f or DKFZphamyB_lil -1 

ELEN6TH3 bMb 



Cpll S-flO 

CHOriOLJ SWISSNEliJ:HRI_J10USE HEME-REGULATED EUKARYOTIC 

15 INITIATION FACTOR EIF-E-ALPHA KINASE (EC E.7.L-) (HEME-REGULATED 
INHIBITOR) (HRI) ( HE HE-CONTROLLED REPRESSOR ) (HCR) (HEHIN- 
SENSITIVE INITIATION FACTOR-E ALPHA KINASE) . □.□ 

lEFUNCATJ D5.D7 translational control ES. cerevisiae-, YDREfl3c3 
5e-43 

20 EFUNCAT2 30-03 organization of cytoplasm ES. cerevisiae-. 
YDR2A3cJ Ee-M3 

EFUNCATJ JiO-OE-11 key kinases ES. cerevisiae*. Y0R531w3 ae-m 

EFUNCAT2 03-04 budding-, cell polarity and filament formation 
ICS- cerevisiae-. Y0RE31w3 fle-m 
25 EFUNCAT2 03-01 cell growth ES- cerevisiae-. Y0RE31w3 fie-14 

EFUNCATH 11.01 stress response IS- cerevisiae-. Y0R231w3 fie-lM 
EFUNCAT3 03 • EE cell cycle control and mitosis IS- cerevisiae-. 
Y0R231uO fie-m 

EFUNCATJ 30.10 nuclear organization ES. cerevisiae-. YKLlOlwJ 
30 fie-lE 

EFUNCAT3 unclassified proteins ES. cerevisiae-. YPLlSOwJ 

fie-lE 

EFUNCAT3 03-13 meiosis ES. cerevisiae-. YDRSE3cJ Ee-11 
EFUNCATID 03-10 sporulation and germination ES. cerevisiae-. 

35 YDR5E3cl Ee^ll -- - 

EFUNCATJ OT-01 biogenesis of cell wall ES - cerevisiae-. 

YPLmOcJ Me-11 

EFUNCATJ 10-03.11 key kinases ES - cerevisiae-, YCR073c3 ^e-ll 

EFUNCATID Tfi classification not yet clear-cut ES. cerevisiae-. 

40 YHROflEc J le-10 

EFUNCAT3 03-07 pheromone response-, mating-type determination-, 
sex-specific proteins ES. cerevisiaei YLR3bEw3 Ee-10 v 
EFUNCAT J 10.05.11 key kinases ES • cerevisiae -i YLR3bEwJ Ee-10 

EFUNCATID lO-OH-ll key kinases ES. cerevisiae-. YLR3tEw3 Ee-10 

45 EFUNCAT2 ID-'H other signal-transduct ion activities ES. 
cerevisiaei YDLlOlcJ 3e-10 

EFUNCAT3 ll-O 1 * dna repair (direct repair-, base excision repair 
and nucleotide excision repair) ES- cerevisiae-, YDLlOlcJ 3e-10 
EFUNCAT J 03 • ES cytokinesis ES. cerevisiae-. YDRS07c3 3e-10 
50 EFUNCAT3) OH-05-01-01 general transcription activities ES. 
cerevisiaei YDLlOflwJ le-D'l 

EFUNCATJ 03 -lb dna synthesis and replication ES. cerevisiae-. 

YBRlbOwJ le-m 

EFUNCATID 01- OS. DM regulation of carbohydrate utilization ES- 
55 cerevisiae-. YLR113w3 Me-O^ 

EFUNCAT3 OS -11 metabolism of energy reserves (glycogen-, 
trehalose) ES - cerevisiae-, YPLD31cHl le-OS 
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EFUNCAT2 0^D5.01-0M transcriptional control ES- cerevisiae-. 

YPL031cJ le-Ofl 

EFUNCAT J 01-OM-OM regulation of phosphate utilization ES- 
cerevisiae-. YPL031c3 le-Ofi 
5 EFUNCAT3 c energy conversion EM- genitaliumi J 2e-0fl 

EFUNCAT3 03-1^ recombination and dna repair ES- cerevisiae-. 

Y0R3Slc3 le-07 

EFUNCATJ 03-22-01 cell cycle check point proteins ES- 
cerevisiaei YPL153c3 le-0? 
10 EFUNCATID 10-05-0^ regulation of g-protein activity ES- 
cerevisiae-i YBLOlbuO 7e-07 

EFUNCATID O^OS-T^ other trna-transcr ipt ion activities ES - 

cerevisiaei YILD35cD le-0b 

EFUNCATl Dfi-13 vacuolar transport ES • cerevisiaei YGLlfiDuO 

15 le-Db 

EFUNCAT3 0b-13-0M lysosomal and vacuolar degradation ES- 
cerevisiaei YGLlflOwU le-Ob 

EFUNCAT3 DM »H other transcription activities ES- cerevisiae-, 
YERl^wID 2e-0b 

20 EFUNCATID 30-02 organization of plasma membrane ES- cerevisiae-, 
YDR122wJ 2e-0b 

EFUNCATID 30-07 organization of endoplasmat ic reticulum ES- 
cerevisiae-, YHR07 c ic3 3e-0b 

EFUNCAT2 Ol-Ob-10 regulation of lipid-, fatty-acid and sterol 
25 biosynthesis ES- cerevisiae-, YHR07Tc3 3e-0b 

EFUNCAT3 Ofi-T^ other intracel lulai — transport activities ES- 
cerevisiae-, YKLnficIB le-05 

EFUNCAT ID lO-OH-T^ other nutritional-response activities ES- 
cerevisiae-, YKLlTflcJ le-05 
30 EFUNCAT! OT-D 1 * biogenesis of cytoskeleton ES - cerevisiae-, 
YNL020cJ 1e-D5 

EFUNCATID Ob-07 protein modification ( glycolsy lat i on acylation-. 

myristylationi palmi ty lation i f arnesy lation and processing) 
ES- cerevisiae-, YFL033c3 Me-OM 
35 EFUNCATID Q1-02-0M regulation of nitrogen and sulphur utilization 
ES- cerevisiaei YNLlfl3cJ 7e-0M 

EBL0CKS2 BL00107A Protein kinases ATP-binding region proteins 

ESCOPHJ dlir3a_ 5-1-1-2-ki insulin receptor Complex 

(transf erase/substrate) le-22 
40 ESC0PJ dlfgkb_ S-l-1-2-5 Fibroblast growth factor 

receptor 1 Ehuman (Horn Te-27 

ESC0PJ dlphk ; 5-1-1-1-b gamma-subuni t of glycogen 

phosphorylase kinas 2e-23 

ESC0P3) dlabo S-l-l-l-lM Protein kiase CK2i alpha 

45 subunit EMaize (Ze le-23 

ESCOPJ d31ck 5-1-1-2-2 Lymphocyte kinase (lck) EHuman 

(Homo sapiens) 3e-22 

ESCOPJ d2erk_ 5-1. 1-1-11 MAP kinase Erk2 Erat (Rattus 

norvegicus) 7e-2Q 
50 ESC0P3 dlcdkb_ 5-1-1-1-2 cAIIP-dependent PKi catalytic 

subunit Comple be-n 

ESCOPJ dlhcl_ 5-1-1-1-1 Cycl in-dependent PK EHuman 

(Homo sapiens) 5e-21 

EEC31 2-7- 1-112 Protein-tyrosine kinase le-Ofl 

55 EEC} 2-7-l-12b beta-Adrenergic-receptor kinase 2e-DA 

EECJ 2-7-1-117 Myosin-light-chain kinase le-OT 

EEO 2-7-1-37 Protein kinase 5e-12 
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E.7-1-1E3 CaE+/calmodulin-dependent protein kinase Me- 

phosphotransf erase CO 
nucleus Te-CH 
RNA binding Ee-Bl 
duplication fle-10 
tandem repeat Me-OT 
zinc 5e-lB 

cell cycle control Be-DT 

serine/threonine-specif ic protein kinase 0-0 
transmembrane protein Be-OT 
zinc finger fle-10 
oncogene be-lE 
autophosphorylation Q.Q 
coat protein le-11 
magnesium Te-QT 
ATP □.□ 

polyprotein fc>e-lB 
receptor Te-OT 
phosphoprotein D-D 
sporulation Be-D^ 
glycoprotein Te-DT 
growth factor receptor ^e-11 
signal transduction Ee-12 

ser ine/thr eon ine/tyr os ine-specif ic protein kinase 

protein kinase fle-10 
transforming protein 2e-12 
heme binding 0-0 
purine nucleotide binding 26-10 
calcium binding Me-QT 
meiosis le-Ofl 

alternative splicing le-11 
P-loop Be-ID 
proto-oncogene 2e-12 
segmentation l 4e-lD 
stress-induced protein le-CH 
EF hand He-OS 
cell division le-CH 
calmodulin binding Me-OS 
LIN protein kinase fle-10 
calcium-dependent protein kinase He-CH 
rat protein kinase raf 5e-12 
AMP-activated protein kinase Be-Dfl 
protein kinase byr2 Se-OT 
SHE homology le-Ofl 

unassigned Ser/Thr or Tyr-specific protein kinases D-D 
leucine-rich alpha-2-glycoprotein repeat homology Te-DT 



EEC3 
OS 

EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKliO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKliO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
(EPIRKLO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKLO 
EPIRKliO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
ae-10 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKliO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
EPIRKLO 
ESUPFAI13 
ESUPFANJ 
ESUPFAI1J 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 

ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 



double-stranded RNA-binding repeat homology Ee-21 
histidine--tRNA ligase homology be-MB 
SAM homology 5e-QS 

avian retrovirus IC10 gag-Rmil-env polyprotein le-11 

LIfl metal-binding repeat homology 6e-10 

GCNB protein be-M2 

protein kinase homology 0-0 

protein kinase C zinc-binding repeat homology Be-IB 
Ca2+/calmodulin-dependent protein kinase II He-Ofl 
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ESUPFAIO beta-adrenergic-receptor kinase Ee-08 

ESUPFAfO kinase-related transforming protein be-12 

ESUPFAtO protein kinase A-raf 2e-lS 

ESUPFAMJ SH3 homology le-Dfl 
5 ESUPFAfO Ca2+/calmodul in-dependent protein kinase te-OI 

ESUPFArn protein kinase XaHl Te-OI 

ESUPFAfO calmodulin repeat homology Me-DT 

ESUPFAfO protein kinase DUN1 le-OI 

ESUPFAfO pleckstrin repeat homology Te-01 
10 ESUPFAfO protein kinase TIK Ee-El 

ESUPFAfO protein-tyrosine kinase tec le-Dfl 

ESUPFAfO kinase interaction domain homology le-DI 

EPROSITEJ PROTEIN_KINASE_ATP 2 

EPROSITEJ PROTEIN_KINASE_ST 1 
15 EPFAfO Eukaryotic protein kinase domain 

EKLO Irregular 

CKU3 3D 

EKU3 L0W_C0flPLEXITY ID . IS '/. 

EKtO COILED_C0IL 5-2b V. 

20 

SE£2 AVLGldPAGUAAARARPAflflGGNSGVRKREEEGDGAGAVAAPPAIDFPAEGPDPEYDESDV 

SEG • • • xxxxxxxxxxxxxx - - . . xxxxxxxxxxxxxxx • • • . ...... 

COILS 

25 ...... 

IjstA 



SEfl PAEIl2VLKEPLc3(2PTFPFAVANt3LLLVSLLEHLSHVHEPNPLRSR<3VFKLLC<2TFIKflGL 

30 SEG xxxxxxxxxxxxxxx . - 

COILS 



IjstA 



35 

SE(3 LSSFTCSl>EFSSLRLHHNRAITHLfJRSAKERVRi2DPCEl>ISRI(2J<IRSREVALEA<2TSRY 

SEG . ... 

COILS 



40 IjstA 



SEC? LNEFEELAILGKGGYGRVYKVRNKLDG(2YYAIKKILIKGATKTVCf1KVLREVKVLAGLfiH 

SEG 

45 COILS 



IjstA 

TTTEEEEEECCCBTTBCEEEEEETTTTCEEEEEEECCTTTTTTTTHHHHHHHHHHHTTTB 

50 SE<3 PNIVGYHTAWIEHVHVIcJPRDRAAIELPSLEVLSDfiEEDREflCGVKNDESSSSSIIFAEP 

SEG . 

COILS 



IjstA 

55 TTBC 

SEA TPEKEKRFGESDTEN<2NNKSVKYTTNLVIRESGELESTLEL<3ENGLAGLSASSIVE(2(JLP 
SEG • 
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COILS 



IjstA 



5 



SE<2 LRRNSHLEESFTSTEESSEENVNFLG<2TEAl2YHLnLHI<2l1<2LCELSLIi)DlilIVERNKRGRE 



COILS 



10 IjstA 



SE<2 YVDESACPYVn"ANVATKIF(2ELVE6VFYIHNI1GIVHRDLKPRNIFLHGP]><3<2VKIGI>FGL 

SEG 

15 COILS 



IjstA 



20 SEC! ACTDIL(2KNTDIiJTNRNGKRTPTHTSRVGTCLYASPE(2LEGSEYDAKSDf1YSLG VVLLELF 

SEG 

COILS 



IjstA 

25 • 

SEC c2PFGTE(1ERAEVLTGLRTGl3LPESLRKRCPV(3AKYI<2HLTRRNSS<2RPSAIi2LL6?SELF<2 

SEG 

COILS 

30 

IjstA 



SE<2 NSGNVNLTLflllKIIEQEKEIAELKKflLNLLSiaDKGVRDDGKDGGVG 

35 SEG • . . xxxxxxxxxxxxxx 

COILS • -CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 



xxxxxxxxxxxxx 



IjstA 



40 



Prosite for DKFZphamy2_lil - 1 



45 



PSD0107 
PS001D7 

psooioa 



no->am 
no->2is 



PROTEIN_KINASE_ATP 
PROTEIN_KINASE_ATP 
PROTEIN_KINASE_ST 



PD0CQD10D 
PD0C0D1DD 
PDOCDOIDO 



Pfam for DKFZphamy2_lil . 1 



50 



HMM_NAI1E Eukaryotic protein kinase domain 



55 



hum 

*YeigRiIGeGsFGtVYkCiUr.TGeIVAIKIIk.krsms F1REI 
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Query IfiM 

FEELAILGKGGYGRVYKVRNKLDCfiYYAIKKILIKGATKTVCIIKVLREV 23B 

Hnn qlMRrLnHPNIIRFYDwFedddDHI* 
5 ++++ L+HPNI+ + +++ ++ H+ 

fluery B33 KVLAGLtfHPNIVGYHTAWI-EHVHV ESb 

HUM 

*IYMIMEYHeGGDLFDYIrrng pMsEwelrf II1y(3IL 

10 + + + I1 + + + E +L+D+I++++ ++ + + +1 + 

+++ 

Query 3Tb LHItfnflLCEL- 

SLlilDliJIVERNKRGREYVDESACPYVflANVATKIFflELV HUB 

15 Hnn 

rGMeYLHSMgllHRDLKPENILIDeN -gqlKIcDFGLARqMn 

+ G + Y+H + I1GI+HRDLKP+NI++ + C2 + KI+DFGLA + 

Query MMM 

EGVFYIHNHGIVHRI>LKPRNIFLHGPDt3fiVKIGI>FGLACTI>IL(3KNT]>LIT ^3 

20 

hmm 

-nYerllttf CGTPUYnriAPEVIImgnyYttkVDriUSFGCILUEnMT 

+ T+++GT Y +PE ++G++Y+ K+DM+S+G++L 

E + + 

25 fluery MTM NRNGKRTPTHTSRVGTCLYA-SPEfl- 

LEGSEYDAKSDMYSLGVVLLELF- SMO 

Hnn 

GepPFyd. • dnHemlmrliqr - f rrpf UpnCSeElyDFMrwCUnyDPekR 
30 +PF ++ E + ++ + ++ ++ +C+ +++ + + +++ 

++R 

tfuery 5M1 — t2PFGTENERAEVLTGLRTG<2LPESLRKRCPVr3AKYI(2-- 

HLTRRNSS<3R 567 

35 HMM PTFrdlLnHPWF* 

P++ (3 + L+ + F 
Query Sfifl PSAIflLLGSELF S^T 
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5 group: transmembrane proteins 

DKFZphamy2_ lil4 encodes a novel bl? amino acid protein with 
similarity to the human l(3)mbt protein homolog. 

10 Mutations of the Drosophila l(3)mbt gene lead to malignant brain 
tumors. The novel protein contains 1 transmembrane domain- 
No informative BLAST results^ No predictive prositen pfam or SCOP 
motife 

15 The new protein can find application in studying the expression 
profile of oncogenes and amgydala-specif ic genes and as a new 
marker for amygdala cells- 



20 similarity to Human l(3)mbt protein homolog mRNA 

> 14 exons (HS75bG23 (EMBLNEU) ) 
Pedant: TRANSMEMBRANE 1 

25 Sequenced by MediGenomix 

Locus: /map="5Sql3-31-13-33" 

Insert length: 3071 bp 
30 Poly A stretch at pos- 3052-. no polyadenylation signal found 



1 GGCAGGCCAA 
51 TCTCATGGAG 
35 101 TGGAGGAAGA 

151 TTCCGGAGTT 
201 GGAGTCAAGT 
251 CCTCCCCGCT 
301 AGTGGTTCTG 
40 351 GGAAGCCTTC 
M01 GGAGCTACTC 
451 GGAA A ACCAC 
501 TGCCAAAATT 
551 ATGGGACACC 
45 bOl AAGTTCCTGA 
bSl GCACGTCCCA 
701 TGGAGGTGCT 
751 GCCTCTGTCA 
601 CTTTGAAAAT 
50 fi51 ATGTCCACCC 
^01 CCACGGACCA 
^51 ACGGCTGGTG 
1001 TGGAGAGCAT 
1051 GACAAGTCCC 
55 1101 CGGGGGTCGC 
1151 TCTGGTGCCA 
1201 CGTGTGGGCC 
1251 TCACCCCACC 



TATGGCTTCC 
AAGCCCCGGA 
GGAAGATGAC 
ATAACAGCAG 
GAAGCAGAAA 
GCATTTGCTC 
AGCCAGCTGT 
TTCTCCAAGA 
CTCCAACTCC 
CGACCAAAAA 
GGAGCCTTCC 
AACAGGACAA 
AGGATCACAG 
CTCTATGACC 
CAACAGTGAT 
TCCAGACAGC 
GACGCCAGCC 
CATTGGCTGG 
TCCATGCCAA 
GGCTCCAGGA 
GAAGTACCCC 
AGGTGTCACG 
CTACGGCTCC 
CATGTGGAGC 
ACGGCATCAA 
TTCCGGAAGA 



TGCACCTGGT 
GTATTGAGGA 
GACTTGGAGC 
TGTGGGCAGT 
ATGAGGATCG 
AGCCCTGGGA 
CTGTGAGATG 
CCAAGAGGTT 
AAGAAAGCCA 
AGCCAA AGTC 
TCCACTCTCA 
GACGCTCTGG 
TTACAAGGCT 
AGTGGGAGGA 
GCTGTGCTCC 
AGGGTATCGG 
ATGACTTCTG 
TGTGCCATCA 
GTTCACCGAC 
CGCTTCCCGT 
TTTAGGCAGG 
CACTCGCATG 
TCTACG AGGA 
CCCCTGATCC 
GATGTCAGAG 
TCTACTGTGA 



GACGCTTGGC 
GACCCCATCT 
TGTTTGGTGG 
GAGAGCAGCT 
GGAAGCAGGG 
CTCCTCGCTC 
TGTGGTATCG 
CTGCAGCGTC 
GTATCTTGGC 
CTGCACAAGG 
AGGGACAGGA 
TCTTGGGCTT 
GCTCCCGTCA 
TGTGATGAAA 
CCAGCCGGGT 
GTGCTGCTTC 
GTGCAACCTG 
ACAGCAAGAT 
TGGAAGGGCT 
GGATTTCCAC 
GCATGCGGCT 
GCTGTGGTGG 
TGGTGACAGT 
ACCCAGTGGG 
AGGCGAAGTG 
TGCCGTTCCT 



GAAACTGAGG 
TCAGAACCAA 
CTATGATAGT 
CCTATCTGGA 
GAACTGCCGA 
CTTGGATGGC 
TGGGTACAAG 
TCCTGCTCCA 
TAGATTACAG 
CTGCCTGGTC 
CAGCTGGCAG 
CGACTGGGGG 
GCTGTTTCAA 
GGGATGAAGG 
GTACTGGATC 
GGTATGAAGG 
GGAACAGTGG 
CCTAGTGCCC 
ACCTCATGAA 
ATCA AGATGG 
GGAAGTGGTG 
ACACAGTAAT 
GACGACGACT 
TTGGTCACGA 
ACATGGCCCA 
TACCTCTTCA 
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1301 AGAAGGTACG AGCAGTCTAC ACAGAAGGCG GTTGGTTTGA GGAAGGGATG 

1351 AAGCTGGAGG CCATTGACCC CCTGAATCTG GGCAACATCT GCGTGGCAAC 

1M01 TGTCTGTAAG GTTCTCCTGG ATGGATACCT GATGATCTGT GTGGACGGGG 

mSl GGCCCTCCAC AGATGGCTTG GACTGGTTCT GCTACCATGC CTCTTCCCAC 

5 1501 GCCATCTTCC CGGCCACCTT CTGTCAGAAG AATGACATTG AGCTCACACC 

1551 GCCAA AAGGT TATGAGGCAC AGACTTTCA A CTGGGAGAAC TACTTGGAGA 

IbOl AGACCAAGTC GAAAGCCGCT CCATCGAGAC TCTTTAACAT GGATTGCCCA 

lb51 AACCATGGCT TCAAGGTGGG CATGAAGCTG GAGGCCGTGG ACCTGATGGA 

1701 GCCCCGGCTC ATCTGTGTGG CCACGGTGAA ACGAGTGGTG CATCGGCTCC 

10 1751 TCAGCATCCA CTTTGACGGC TGGGACAGCG AGTACGACCA GTGGGTGGAC 

lflDl TGCGAGTCCC CAGACATCTA CCCCGTCGGC TGGTGTGAGC TCACCGGCTA 

1A51 CCAGCTCCAG CCTCCTGTGG CCGCAGGTGT GGGCTCTCGT GGCCCTAAGA 

MDl GGCTCTGACT TTCTTTCCTC TTCTTTTTTC CTTCTTCCCC CGCCCCTGTG 

1T51 CCCATCTGCG TTCTTTGGCA TGAGGTGGAG ATGTCTCATG GACCACTTTA 

15 2DD1 AGTAGAGAGT GAGCCCCGTC ACCCAGCCCC TGCf CCTGAC TTCTCTGTCT 

2051 CCCTTTCCCT CTGGCCTGCA GAGCTCCTTC CTTCATCTTG CCCACTCTGT 

2101 CATATGTTCG TGCCCTTGTG CACCCAGGTA AACTACCCAG GTCCCTCTGA 

2151 GCAGCCCTGG TAACAAGGGT GGGAAGAAGG GACAGCTGTT CTCCGGCCCC 

2201 TCCTCCAGCC CCGCCCTCTC CTCATTGCCC AGGTTTGGCT TCCTGTCTTG 

20 2251 GGGTGTCTCG TGTGGGAGGG TGGATGGGGT CTCGGGATGC GCCTGTGCCC 

2301 TGTGTCCTCC CAGGGACCCT CTTCTCATCT CTTTCACCCT TGTCTTTCAA 

2351 CAACAGAACC GGCCACACCG CTGAAGGCCA AAGAGGCCAC AAAGAAGAAA 

2MD1 AAGA AACAGT TTGGGAAGAA AAGGAA AAGA ATCCCGCCCA CTAAGACGCG 

2HS1 ACCCCTCAGA CAGGGGTCCA AGAAGCCCCT GCTGGAGGAC GACCCTCAGG 1 

25 2SD1 GTGCCAGGAA GATCTCGTCG GAGCCTGTTC CTGGCGAGAT CATTGCTGTG 

2551 CGTGTGAAGG AAGAGCATCT AGACGTGGCC TCGCCCGACA AGGCTTCAAG 

2b01 TCCAGAGCTG CCTGTCTCCG TCGAGAACAT CAAGCAGGAA ACAGACGACT 

2bSl GAGCCTTCCT GCCTCCAGCC TGGCTTCTAG CTGGAAGCC A GCCCAGCGTT 

27D1 TCTCTACCAC CACCACCATG CCTCCACCTG ACTTTGGCTT GGAGACTGAT 

30 2751 CCTCTCTGTG TAAATTCTGC CCGGTGCTGT GAAGGCTGGA CGGTGGAGGA 

2fi01 CCTGCTGGGG TCTCCTGGGA CCCGCCTGTT GCTTCTGCCC TCCCCTGTGG 

2fi51 A AAGGTCTAT ATGACGGGCC GCCTGAGGCC CCAGAACTCG TCTGTGAACC 

2^01 ACCTTTTCCA GCCAGAGTTC CCAAAGCTGG AACGCTAGCT GCCTGCTCTT 

2*551 CCTTAAGATG GCCTCCCCCC GACCCGCCAC GGCCCTCAGT TGCCAGGGAT 

35 3001 GGGGCCACCA CTGTCACACT GTGGAATACA AGACAGTGAA CTCTGTCTGC 

3051 CTAAAAAAAA AAAAAA AAAA A 



BLAST Results 

40 

Entry HS75bG23 from database EMBLNEU: 

Human DNA sequence from clone 75bG23 on chromosome 22ql3 . 31-13 . 33 
Score = 3^3^ P = □ - De+DDi identities = fi75/ c J5H 

45 

Entry Ufl^BSfiJ from- database TREMBL : 

product: n l(3)mbt protein homolog"; Human l(3)mbt protein 
homolog 

mRNAi complete cds- 
50 Score = SD5-. P = 7-2e-M5-» identities = 123/32Q-. positives = 
17D/320-, 
frame +1 

Entry AB0m5fil_l from database TREMBL: 
55 gene: "KIAAObfil" ; product: "KIAAObfll protein"; Homo sapiens 
mRNA for 

KIAADbfll protein-i partial cds- 
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Score = 503i P = 

lb3/307-, 

frame +1 



me-Mb-i identities 



PCT/IB01/02050 
= 12E/3D7-I positives = 



Medline entries 

10 No Medline entry 



15 



20 



35 



40 



Peptide information for frame 1 



ORF from 55 bp to bp, peptide length: bl7 

Category: similarity to known protein 
Classification: unclassified 



1 MEKPRSIEET PSSEPMEEEE DDDLELFGGY DSFRSYNSSV GSESSSYLEE 

SI SSEAENEDRE AGELPTSPLH LLSPGTPRSL DGSGSEPAVC EMCGIVGTRE 

101 AFFSKTKRFC SVSCSRSYSS NSKKASILAR LtJGKPPTKKA KVLHKAAUSA 

1S1 JCIGAFLHS0G TGQLADGTPT GflDALVLGFD UGKFLKDHSY KAAPVSCFKH 

25 S01 VPLYD<3U)EDV MKGMKVEVLN SDAVLPSRVY UIASVIflTAG YRVLLRYEGF 

HS1 ENDASHDFUC NLGTVDVHPI GWCAINSKIL VPPRTIHAKF TDUKGYLMKR 

3D1 LVGSRTLPVD FHIKMVESMK YPFRfiGMRLE VVDKSOVSRT RMAVVDTVIG 

351 GRLRLLYEDG DSDDDFWCHM WSPLIHPVGId SRRVGHGIKM SERRSDMAHH 

M01 PTFRKIYCDA VPYLFKKVRA VYTEGGUFEE GMKLEAIDPL NLGNICVATV 

30 MSI CKVLLDGYLM ICVDGGPSTB GLDblFCYHAS SHAIFPATFC flKNDIELTPP 

SD1 KGYEACTFNUI ENYLEKTKSK AAPSRLFNMD CPNHGFKVGM KLEAVDLMEP 

551 RLICVATVKR VVHRLLSIHF DGWDSEYDflU VDCESPDIYP VGWCELTGYfl 
bOl LflPPVAAGVG SRGPKRL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamyS_lilM -. frame 1 



TREMBL : ABDmSfll_l gene: "KIAAQbfil";, product: "KIAAQbfil protein"i 
Homo 

45 sapiens mRNA for KIAAQbfil protein-, partial cds--, N = 1-, Score = 
SD3i P 
= 3-le-Mfi 

TREMBL : Ufl c i35fi_l product: "l(3)mbt protein homolog":, Human 
50 l(3)mbt 

protein homolog mRNAi complete cds. n N = l-> Score = SOSn P = 
b-2e-Mfi 

55 >TREMBL:llfl e i35fl_l product: "l(3)mbt protein homology Human 
l(3)mbt protein 

homolog mRNAi complete cds- 
Length = 775 
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HSPs: 

Score = SUB (75-ft bits)-. Expect = b.2e-Mfii P = b . Be-Mfi 
5 Identities = 123/313 (3TX)., Positives = 170/313 C5M*> 

fiuery: 2^3 WKGYLriKRLVGSRTLPVDFH-- 
IKMVESriKYPFR(3Gf1RLE VVDKSfiVSRTRIlA VVDTVIG 3SD 

U+ YL ++ + T PV + V K F+ GI1+LE + D S + 

10 V V G 

Sbjct: 20fl UESYLEEc3K:~ 

AITAPVSLF(3DS(3AVTHNKN6FKLGriKLEGIDP(3HPSriYFILTVAEVC6 2b5 

(Query: 351 GRLRLLYEDGDSD-DDFliJCHriUSPLIHPVGUSRRVGHGIKtlSE-- 
15 RRSDNAHHPTFRKIY MD7 

RLRL + DG S + DFW + SP IHP GUI ■+ GH +++ + + + + 
Sbjct: 2bb YRLRLHF- 

DGYSECHDFUVNANSPDIHPAGUFEKTGHKLt3LPKGYKEEEFSLISt3YI1CSTR 32M 

20 fluery: i*Da CDAVP- 

YLFKKVRAVYTEGGUFEEGMKLEAIDPLNLGNICVATVCKVLLDGYLIIICVDGG Mbb 

A P ++F G F+ GMKLEA+D +N +CVA+V V + D 

++ D 

Sbjct: 325 A(2AAPKH>IFVS(3SHSPPPLG-F(2VGnKLEAVDRI1NPSLVCVASVTD VV- 
25 DSRFLVHFDNli) 362 

fluery: Mb? PSTDGLDWFCYHASSHAIFPATFCtf KNDIELTPPKGY- 
EAflTFNUENYLEKTKSKAAPSR 525 

T D++C SS IP +C(3K LTPP+ Y + F WE YLE+T 

30 .+ A P+ 

Sbjct: 3fl3 DDT--YDYUC- 

DPSSPYIHPVGWC{3K(3GKPLTPP(2I>YPDPI>NFCUEKYLEETGASAVPTliJ WS^ 
tfuery: 52b 

35 LFNnPCPNHGFKVGnKLEAVDLnEPRLICVATVICRVVHRLLSIHFDGlJDSEYDflLIVDCES 5fi5 

F + P H F V MKLEAVD P LI VA+V+ V + IHFDGL) 

YD bl + D + 

Sbjct: HUD AFKVR- 

PPHSFLVNriKLEAVDRRNPALIRVASVEDVEDHRIKIHFDGWSHGYDFUIDADH M^fi 

40 

(2uery: 5flb PDIYPVGUCELTGYQLdPPV b05 

PDI+P GUC TG+ L<2PP + 
Sbjct: MTT PDIHPAGUCSKTGHPLdPPL 51fl 

45 Score = 333 (50-0 bits)i Expect = M.le-27-. P = M^le-27 
Identities = 103/32H (315:) •» Positives = 151/32M (Mb*) 

fluery: 17T FDbJGKFLKDHSYKAAPVSCFKHVPLYDtSWEDVflK- 
GNKVEVLNSDAVLPSRVYWIASVId 237 
50 + Id +L + + APVS F+ ++ K GMK+E + D PS 

+Y+I +V + 

Sbjct: 2Qb USUESYLEEflKAITAPVSLFflDStJAVTHNKNGFKLGHKLEGI — DP (2 HPS- 
MYFILTVAE 2b2 

55 fiuery: 23fl 

TAGYRVLLRYEGFENDASHDFUCNLGTVDVHPIGUCAINSKILVPPRTIHAKFTDUKGYL 2^7 
GYR+ L ++G+ HDFld N + D + HP GW L P + 

W Y + 
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10 



45 



50 



Sbjct: 2b3 VCGYRLRLHFDGYSE-- 

CHDFUVNANSPDIHPAGUFEKTGHKLtJLPKGYKEEEFSliJSfiYri 320 

fluery: 21ft NKRLVGSRTLPvDFHIKflVESIIKYP 

FR<2GP1RLEVVDKS(3VSRTRI1AVVDTVIGGRLR 3SM 

+R H+ + +S P F+ GI1 + LE VD + S +A V 

V+ R 

Sbjct: 321 CS 

TRAflAAPKHMF VS0SHSPPPLGF<2VGHKLEAVDRMNPSLVCVASVTDVVDSRFL 37b 



(3uery: 3SS LLYEDGDSDDDFUCHIIlilSPLIHPvGIJSRRVGHGIKIISERRSI) 

MAHHPTFRKIYCDAV 411 

+ +++ d j> +uc SP iHPVGId ++ G + + D 

+ AV 
15 Sbjct: 377 

VHFDNIdDDTYD YUCJ)PSSPYIHPVGUC(2Kt3GKPLTPP(2DYP]>PDNFChlEKYLEETGASAV 43b 
tJuery: 412 

PYLFKKVRAVYTEGGUFEEGflKLEAIDPLNLGNICVATVCKVLLDGYLrilCVDGGPSTDG 471 
20 P KVR ++ F HKLEA + D N I VA+V V D + I 

DG + G 

Sbjct: 437 PTUAFKVRPPHS FLVNI1ICLEAVDRRNPALIRVASVEDVE- 

DHRIKIHFDGU — SHG 461 

25 fluery: 472 LDUFCYHASSHAIFPATFCflKNDIELTPPKG 5D2 

P F A I PA +C K L PP fi 
Sbjct: 410 YD-FUIDADHPDIHPAGUCSKTGHPLdPPLG 511 

Score = 23b (35-4 bits) i Expect = 2-Se-lb-, P = 2-Se-lb 
30 Identities - 47/110 C42X)-. Positives = bb/110 (bOS) 

fluery: 111 PPKGYEAflTFNWENYLEKTKSKAAPSRLF-NMDCPNH 

GFKVGflKLEAVDLFIEPRLIC 554 

p g + + ++UE+YLE+ K + AP LF + H GFK+GHKLE +D 

35 P + 

Sbjct: 117 

PATGEKKECUSUESYLEE(3KAITAPVSLFc3I>S(3AVTHN<NGFKLGf1KLEGIDP<3HPSnYF 25b 

fluery: 555 VATVKRVVHRLLSIHFDGWDSE YD<2UVDCESPDIYPVGUCELTGY<2Le2PP 
40 bD4 

+ TV V L +HFDG+ +» UV+ SPDI+P Ghl E TG++LQ P 
Sbjct: 257 ILTVAEVCGYRLRLHFDGYSECHDFUVNANSPDIHPAGUFEKTGHKLflLP 



3Db 



Pedant information for DKFZphamy2_lil4 -i frame 1 
Report for DKFZphamy2_lil4 . 1 



CLENGTH3 bl7 

EMU3 b12b4.11 

EpI3 b-D5 

55 EH0M0L3 TREMBL : Ufi135fl_l product: "l(3)mbt protein 

homolog"} Human l(3)mbt protein homolog mRNA-i complete cds. le-47 

EBL0CKS3 BL0120bA Ami loride-sensiti ve sodium channels proteins 
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EKUO TRANSMEMBRANE 1 

LOU_COMPLEXITY 

5 SEfl MEKPRSIEETPSSEPMEEEEDDDLELFGGYDSFRSYNSSVGSESSSYLEESSEAENEDRE 

SEG '-' xxxxxxxxxxxxxxxxxxx xxxx xxxxxxxxxxxxxxxxxxx 

PRD ccccceeeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

10 SE<2 AGELPTSPLHLLSPGTPRSLDGSGSEPAVCEMCGIVGTREAFFSKTKRFCSVSCSRSYSS 

SEG ■ -• xxxxxxxxxx 

PRD ccccccccccccccccccccccccccceeeeeecccccccccccccccceeeecccccce 

I1EI1 • • 

15 SE<2 NSKKASILARLflGKPPTKKAKVLHKAAUSAKIGAFLHS(2GTGi3LADGTPTG(2DALVLGFD 

SEG xxxxxx • • • • 

PRD ccchhhhhhhhhcccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeeecc 

mem 

20 SE(J U6KFLKDHSYKAAPVSCFKHVPLYI>flLIEDVriKGf1KVEVLNSI>AVLPSRVYliJIASVI(JTAG 

SEG 

PRI> chhhhhhccccccccccccccccccccchhhhhheeeeeccccccceeeehhhhhhhhhc^ 

MEM 

25 SEfl YRVLLRYEGFENDASHDFUICNLGTVDVHPIGUCAINSKILVPPRTIHAKFTDWICGYLI1KR 

SEG 

PRD ceeeeeeccccccccceeeeccccccccccccccccceeeccccccccccccchhhhhhh 

riEii - 

30 SE<2 LVGSRTLPVDFHIK(1VESHKYPFR(3GI1RLE VVDKS(3VSRTRMAVVDTVIGGRLRLLYEDG 

SEG • - - • - 

PRD hccccccccccccccccccccccccccceeeecccccceeeeeeeeeccccceeeeeccc 

MEI1 

35 SEfl: DSDDDFUCHnUSPLIHPVGUSRRVGHGIKIISERRSDnAHHPTFRK-IYCDAVPYLFKKVRA 

SEG 

PRD cccceeeeeccccccccccccccccccccccccccccccchhhhhhcccccccccccccc 

MEM • 

40 SEfl VYTEGGlilFEEGMKLEAIDPLNLGNICVATVCKVLLDGYLMICVDGGPSTDGLDUFCYHAS 

SEG 

PRD ccccccchhhhheeeeccccccceeeeeeehhhhhcceeeeeeccccccccceeeeeecc 

MEM MMMMMMMMMMMMMMMMM 

45 SE<2 SHAIFPATFC(3KNDIELTPPKGYEA(3TFNUENYLEKTKSKAAPSRLFNMDCPNHGFKVGM 

SEG 

PRD cccccccccccccccccccccccccchhhhhhhhhhhhccccccccccccccchhhhhhe 

MEM : • . . . 

50 SEfl KLEAVDLMEPRLICVATVKRVVHRLLSIHFDGUDSEYDflUVDCESPDIYPVGUCELTGYQ 

SEG - 

PRD eeeccccccccGeGeeehhhhhhhheeeeeccccccccccccccccccccceeeeccccc 

MEM 

55 SEfi LflPPVAAGVGSRGPKRL 

SEG 

PRD ccccccccccccccccc 

MEM 
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(No Prosite data available for DKFZphamy5_lim • 1 ) 
5 (No Pfam data available for DKFZphamyE_lilM - 1 > 
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5 group: differentiation/development 

DKFZphamy2_li24 encodes a novel 635 amino acid protein without 
partial similarity to rattus norvegicus Notch2 protein- 

10 Notch family molecules are thought to be negative regulators of 
neuronal differentiation in early brain development. Notch2 is 
expressed not only by neuronal cells in the embryonic braini but 
also by glial cells in the postnatal brain- The new protein 
represents a new member of this family and may be involved in 

15 specific dif f erentation or developmental pathways of the nervous 
system. 

The new protein can find application in modulating development 
and differentiation of amygdala cells. 

20 

putative protein 
probably complete cds. 

25 

Sequenced by MediGenomix 

Locus : unknown 

30 Insert length: E7b6 bp 

Poly A stretch at pos. 2?m-i polyadenylat ion signal at pos- 2k^7 



1 AGAAATCTTC AGCCAAACAG CTGCAGGA AG TAGAGAAGGT TAA ACCCCAG 

35 51 AGTGAGAAAG TTCATCAGAC TCTGATTCTG GACCCAGCAC AGAGGAAGAG 

101 ACTCCAGCAG CAGATGCAGC AGCACGTTCA GCTCTTGACC CAAATCCACC 
151 TTCTTGCCAC CTGCAACCCC AACCTCAATC CGGAGGCCAC TACCACCAGG 
201 ATATTTCTTA AAGAGCTGGG A ACCTTTGCT CAAAGCTCCA TCGCGCTTCA 
251 CCATCAGTAC AACCCCAAGT TTCAGACCCT GTTCCAACCC TGTAACTTGA 

40 301 TGGGAGCTAT GCAGCTGATT GAAGACTTCA GCACACATGT CAGCATTGAC 

351 TGCAGCCCTC ATAAAACTGT CAAGAAGACT GCGAATGAAT TTCCCTGTTT 
M01 GCCAAAGCAA GTGGCTTGGA TTCTGGCCAC AAGCAAGGTT TTCATGTATC 
H51 CAGAGTTACT TCCAGTGTGT TCCCTGAAGG CAAAGAATCC CCAGGATAAG 
501 ATCGTCTTCA CCAAGGCTGA GGACAATTTG TTAGCTTTAG GACTGA AGCA 

45 551 TTTTGAAGGA ACTGAGTTTC CTAATCCTCT AATCAGCAAG TACCTTCTAA 

fcOl CCTGCAAAAC TGCCCACCAA CTGACAGTGA GAATCAAGAA CCTCAACATG 
b51 AACAGAGCTC CTGACAACAT CATTAAATTT TATAAGAAGA CCAAACAGCT 
701 GCCAGTCCTA GGAAAATGCT GTGAAGAGAT CCAGCCACAT CAGTGGAAGC 
751 CACCTATAGA GAGAGAAGAA GACCGGCTCC CATTCTGGTT AAAGGCCAGT 

50 601 CTGCCATCCA TCCAGGAAGA ACTGCGGCAC ATGGCTGATG GTGCTAGAGA 
651 GGTAGGAAAT ATGACTGGAA CCACTGAGAT CAACTCAGAT CGAAGCCTAG 
^01 AAAAAGACAA TTTGGAGTTG GGGAGTGAAT CTCGGTACCC ACTGCTATTG 
=151 CCTAAGGGTG TAGTCCTGAA ACTGAAGCCA GTTGCCACCC GTTTCCCCAG 
1001 GAAGGCTTGG AGACAGAAGC GTTCATCAGT CCTGA AGCCC CTCCTTATCC 

55 1051 AACCCAGCCC CTCTCTCCAG CCCAGCTTCA ACCCTGGGAA AACACCAGCC 
1101 CGATCAACTC ATTCAGAAGC CCCTCCGAGC AAAATGGTGC TCCGGATTCC 
1151 TCACCCAATA CAGCCAGCCA CTGTTTTACA GACAGTTCCA GGTGTCCCTC 
1201 CACTGGGGGT CAGTGGAGGT GAGAGTTTTG AGTCTCCTGC AGCACTGCCT 
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1551 GCTGTGCCCC CTGAGGCCAG GACAAGCTTC CCTCTGTCTG AGTCCCAGAC 

1301 TTTGCTCTCT TCTGCCCCTG TGCCCAAGGT AATGCTGCCC TCCCTTGCCC 

1351 CTTCTAAGT.T TCGAA AGCCA TATGTGAGAC GGAGACCCTC AAAGAGAAGA 

1M01 GGAGTCAAGG CCTCTCCCTG TATGAAACCT GCCCCTGTTA TCCACCACCC 

5 IHSl TGCATCTGTT ATCTTCACTG TTCCTGCTAC CACTGTGAAG ATTGTGAGCC 

1501 TTGGCGGTGG CTGTAACATG ATCCAGCCTG TCAATGCGGC TGTGGCCCAG 

1551 AGTCCCCAGA CTATTCCCAT CACTACCCTC TTGGTTAACC CTACTTCCTT 

IbDl CCCCTGTCCA TTGAACCAGT CCCTTGTGGC CTCCTCTGTC TCACCCTTAA 

lb51 TTGTTTGTGG CAATTCTGTG AATCTTCCTA TACCATCCAC CCCTGA AGAT 

10 1701 AAGGCCCACG TGAATGTGGA CATTGCTTGT GCTGTGGCTG ATGGGGAAAA 

1751 TGCCTTTCAG GGCCTAGAAC CCAAATTAGA GCCCCAGGAA CTATCTCCTC 

IflDl TCTCTGCTAC TGTTTTCCCG AAAGTGGAAC ATAGCCCAGG GCCTCCACTA 

1651 GCAGATGCAG AGTGCCAAGA AGGATTGTCA GAGAATAGTG CCTGTCGCTG 

nOl GACCGTTGTG AAAAGAGAGG AGGGGAGGCA AGCTCTGGAG CCGCTCCCTC 

15 1^51 AGGGCATCCA GGAGTCTCTA AACAACCCTA CCCCTGGGGA TTTAGAGGAA 

5001 ATTGTCAAGA TGGAACCTGA AGAAGCTAGA GAGGAAATCA GTGGATCCCC 

2DS1 TGAGCGTGAT ATTTGTGATG ACATCAAAGT GGAACATGCT GTGGAATTGG 

5101 ACACTGGTGC CCCAAGCGAG GAGTTGAGCA GTGCTGGAGA AGTAACGAAA 

5151 CAGACAGTCT TACAGAAGGA AGAGGAGAGG AGTCAGCCAA CTAAAACCCC 

20 55D1 TTCATCTTCT CAAGAGCCCC CTGATGAAGG AACCTCAGGG ACAGATGTGA 

5551 ACAAAGGATC ATCAAAGAAT GCTTTGTCCT CAATGGATCC TGAAGTGAGG 

53D1 CTTAGTAGCC CCCCAGGGAA GCCAGAAGAT TCATCCAGTG TTGATGGTCA 

5351 GTCAGTGGGG ACTCCAGTTG GGCCAGAAAC TGGAGGAGAG AAGAATGGGC 

SM01 CAGAAGAAGA GGAAGAAGAG GACTTTGATG ACCTCACCCA AGATGAGGAA 

25 5451 GATGAAATGT CATCAGCTTC TGAGGAATCT GTGCTTTCTG TCCCAGAACT 

55D1 CCAGGTGAGA GCTGGAGAAT ATTCTCAAGT ATTTCGTGGA CTCAGTAATA 

5551 TGTATCACTT ATTGATATGC CACCTGCTTG CTTGCTGCAC TATGGATAGT 

SkOl CCTAAAATCA TTTGTATTTG ATTTGTGAAT GCATTATGGG ACATGATTGT 

5b51 GGAGTTGAGG TGAAATGAGA TGGAAAGGAT GAAATTTTAC TTATTATATT 

30 5701 AAACTCGTTT ACACATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAA AAGA 
5751 AAAAAAAAAA AAAAAAAA 



BLAST Results 

35 

Entry RNNOTCHX from database ENBL : 
Rat notch 5 mRNA • 
Score = aifli P = l.be-5b-i identities = 51b/577 

40 



Medline entries 



45 



No Medline entry 



50 Peptide information for frame 3 

ORF from 114 bp to 5bl6 bp=i peptide length: 635 
Category: putative protein 
55 Classification : Differentiation/Development 

1 !1fl<3HV<2LLT<2 IHLLATCNPN LNPEATTTRI FLKELGTFA<3 SSIALHHtfYN 
51 PKF(3TLF<2PC NLflGAMGLIE DFSTHVSIDC SPHKTVKKTA NEFPCLPKGV 
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101 AWILATSKVF flYPELLPVCS LK AKNPflDKI VFTKAEDNLL ALGLKHFEGT 

151 EFPNPLISKY LLTCKTAHQL TVRIKNLNMN RAPDNIIKFY KKTKflLPVLG 

SOI KCCEEI(2PH<2 UKPPIEREEH RLPFULKASL PSIflEELRHM ADGAREVGNH 

ESI TGTTEINSDR SLEKDNLELG SESRYPLLLP KGVVLKLKPV ATRFPRKAUR 

5 301 (2KRSSVLKPL LK3PSPSL0P SFNPGKTPAR STHSEAPPSK MVLRIPHPItJ 

351 PATVLflTVPG VPPLGVSGGE SFESPAALPA VPPEARTSFP LSESQTLLSS 

M01 APVPKVI1LPS LAPSKFRKPY VRRRPSKRRG VKASPCMKPA PVIHHPASVI 

M51 FTVPATTVKI VSLGGGCNm UPVNAAVAflS PUTIPITTLL VNPTSFPCPL 

5D1 NtJSLVASSVS PLIVSGNSVN LPIPSTPEDK AHVNVDIACA VADGENAFGG 

10 551 LEPKLEP(2EL SPLSATVFPK VEHSPGPPLA DAECC2EGLSE NSACRWTVVK 

tDl TEEGRflALEP LP<2GI<3ESLN NPTPGDLEEI VKMEPEEARE EISGSPERDI 

bSl CDDIKVEHAV ELDTGAPSEE LSSAGEVTK(2 TVLdKEEERS (2PTKTPSSS<2 

701 EPPDEGTSGT DVNKGSSKNA LSSMDPEVRL SSPPGKPEDS SSVDGflSVGT 

751 PVGPETGGEK NGPEEEEEED FDDLTflDEED EMSSASEESV LSVPELdVRA 

15 SOI GEYS(2VFRGL SNMYHLLICH LLACCTtlDSP KIICI 



BLASTP hits 

20 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_li2M -, frame 3 
25 No Alert BLASTP hits found 

Pedant information for DKFZphamy2_li2M frame 3 

30 Report for ]>KFZphamy2_li24 - 3 

ELENGTHJ 872 

35 EpI3 5. 67 . 

DTHOMOLl PIR:S4fli47fl glucan 1-, 4-alpha-glucosidase (EC 

3.2-1.3) - yeast ( Saccharomyces cerevisiae) 5e-0b 
EFUNCATID 3D- 01 organization of cell wall ES- cerevisiae-. 

YIRDncl 2e-07 

40 EFUNCAT3 30-10 extracellular/secretion proteins CS- cerevisiae-. 

YIROllcJ 2e-D7 

CFUNCATJ 01.05-D1 carbohydrate utilization IS- cerevisiae-. 

YIR011c3 2e-07 

CFUNCAT3 02-10 tricarboxylic-acid pathway ES. cerevisiae-. 

45 YDRlMflcl 5e-0M 

CFUNCAT3 3D.lt. mitochondrial organization ICS - cerevisiae-. 

YDRlMacD Se-OM 
IKbO Alpha_Beta 

IKIiO LOli)_COHPLEXITY 1.i|0 X 



50 



55 



SE<3 KSSAK(2L(2EVEKVICP(2SEKVH(3TLILDPA(2RKRL(2(3(3na(2HV(2LLTi3IHLLATCNPNLNP 

SEG • xxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhccccchhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhcccccccc 

SE<3 EATTTRIFLKELGTFAOSSIALHHflYNPKFQTLFflPCNLnGAn<2LIEDFSTHVSIDCSPH 

SEG - 

PRD cchhhhhhhhhhhhhhhhhhhhhcccccceeeeecccchhhhhhhhhhceeeeeeccccc 
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SEfl KTVKKTANEFPCLPK(3V AUILATSKVFM YPELLPVCSLKAKNPfiDKIVFTKAEDNLLALG 

SEC - 

PRD eeeeeccccccccchhhhhhhccceeeecccccccccccccccceeeeeeeccchhhhhh 

5 

SE(2 LKHFEGTEFPNPLISKYLLTCKTAH(2LTVRIKNLNriNRAPDNIIKFYKKTK(3LPVLGKCC 

SEG 

PRD hheeecccccccceeeeeeeehhhhhhhhheGGCccccccccGeeeeeccccccccceee 

10 SE<2 EEIr3PH(3UKPPIEREEHRLPFULKASLPSIt3EELRHMADGAREVGNriTGTTEINSI>RSLE 

SEG 

PRD eeecccccccccchhhhhcceeeGcchhhhhhhhhhhhhhhhhhhccccccccccccGee 

SEfl KDNLELGSESRYPLLLPKGVVLKLKPVATRFPRKAWR(3K:RSSVLKPLLI(3PSPSL(3PSFN 

15 SEG . . . - xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx. . 

PRD GCCCCCCCCCCCCCCCCCCGGGGGGGGGGGCCChhhhhhcCCCCCCCCCCCCCCCCCCCC 

SE(2 PGKTPARSTHSEAPPSKMVLRIPHPIt2PATVL(2TVPGVPPLGVSGGESFESPAALPAVPP 

SEG • • xxxxxxxxxxxx • 

20 PRD CCCCCCCCCCCCCCCCCCGGGCCCCCCGGGGGGCCCCCCCCCICCCCCCCCCCCCCCCCCC 

SE<2 EARTSFPLSES(2TLLSSAPVPKVI1LPSLAPSKFRKPYVRRRPSICRRGVKASPCMKPAPVI 

SEG - 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

25 

SE<2 HHPASVIFTVPATTVKIVSLGGGCNni(3PVNAAVAr3SP(2TIPITTLLVNPTSFPCPLNc3S 

SEG 

PRD ccccceeecccccceeeeeccccccccccccccccccccccccceeeccccccccccccc 

30 SE<2 LVASSVSPLIVSGNSVNLPIPSTPEDKAHVNVDIACAVADGENAFflGLEPKLEPtSELSPL 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeecccccccccccccccccccccc 

SE(3 SATVFPKVEHSPGPPLAD AECt3EGLSENSACRUTVVKTEEGRt2ALEPLP(3GIt2ESLNNPT 

35 SEG • 

PRD ccccccccccccccccccccccccccccccceeeGeecccccccccccccceeeeccccc 

SE<2 PGDLEEIVKHEPEEAREEISGSPERDICDDIKVEHA VELDTGAPSEELSSAGEVTKi2TVL 

SEG 

40 PRD ccccccccccccccceeeccccccccccccccccccccccccccccccccccccccchhh 

SE<3 l3KEEERS<2PTKTPSSS(2EPPDEGTSGTDVNKGSSKNALSSMDPEVRLSSPPGKPEDSSSV 

SEG 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccccccccccccccccc 

45 

SEt3 DG(2SVGTPVGPETGGEKNGPEEEEEEDFDDLT<2 DEED EMS SASEESVLSVPELC2VR AGE Y 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhcccchhhhhhhhhhcccccccccceGGeecccc 

50 SE<2 StfVFRGLSNnYHLLICHLLACCTMDSPKIICI 

SEG 

PRD eGeGeehhhhhhhhhhhhhhhhcccccccccc 

55 (No Prosite data available for DKFZphamyB_li2M - 3) 
(No Pfam data available for DKFZphamy2_li2M . 3 ) 
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DKFZphamy2_l jn 



group: differentiation/development 

DKFZphamyB_l jn encodes a novel ISO amino acid protein with high 
similarity to the allograft inflammatory factor-1 of Cyprinus 
carpio - 

Allograft inflammatory factor-1 (AIF-n is a protein involved in 
allograft rejection. In experimental autoimmune encephalomyelitis 
(EAE)-i neuritis(EAN) and uveitis (EAU) it is produced by 
macrophages and microglia cells- 

The new protein can find clinical application in the development 
of tools to enhance the compatibility of transplanted tissues as 
well as in expression profiling of autoimmune diseases and 
infections- 



strong similarity to allograft inflammatory factor-1 (Cyprinus 
carpio) 

identical to DKFZphamyE_lnl 
Sequenced by MediGenomix 

Locus: /map = T? 50H. c i cR from top of Chr^ linkage group" 
Insert length: 33A1 bp 

Poly A stretch at pos. 33L5-, polyadenylation signal at pos- 33HM 



1 GCCGGAGCCC GGACCAGGCG CCTGTGCCTC CTCCTCGTCC CTCGCCGCGT 

51 CCGCGAAGCC TGGAGCCGGC GGGAGCCCCG CGCTCGCCAT GTCGGGCGAG 

101 CTCAGCAACA GGTTCCAAGG AGGGAAGGCG TTCGGCTTGC TCAAAGCCCG 

151 GCAGGAGAGG AGGCTGGCCG AGATCAACCG GGAGTTTCTG TGTGACCAGA 

201 AGTACAGTGA TGAAGAGAAC CTTCCAGAAA AGCTCACAGC CTTCAAAGAG 

551 AAGTACATGG AGTTTGACCT GAACAATGAA GGCGAGATTG ACCTGATGTC 

301 TTTAAAGAGG ATGATGGAGA AGCTTGGTGT CCCCAAGACC CACCTGGAGA 

351 TGAAGAAGAT GATCTCAGAG GTGACAGGAG GGGTCAGTGA CACTATATCC 

i»Dl TACCGAGACT TTGTGAACAT GATGCTGGGG AAACGGTCGG CTGTCCTCAA 

M51 GTTAGTCATG ATGTTTGAAG GAAAAGCCAA CGAGAGCAGC CCCAAGCCAG 

501 TTGGCCCCCC TCCAGAGAGA GACATTGCTA GCCTGCCCTG AGGACCCCGC 

551 CTGGACTCCC CAGCCTTCCC ACCCCATACC TCCCTCCCGA TCTTGCTGCC 

L>01 CTTCTTGACA CACTGTGATC TCTCTCTCTC TCATTTGTTT GGTCATTGAG 

L51 GGTTTGTTTG TGTTTTCATC AATGTCTTTG TAAAGCACAA ATTATCTGCC 

701 TTAAAGGGGC TCTGGGTCGG GGAATCCTGA GCCTTGGGTC CCCTCCCTCT 

751 CTTCTTCCCT CCTTCCCCGC TCCCTGTGCA GAAGGGCTGA TATCAAACCA 

601 AAAACTAGAG GGGGCAGGGC CAGGGCAGGG AGGCTTCCAG CCTGTGTTCC 

651 CCTCACTTGG AGGAACCAGC ACTCTCCATC CTTTCAGAAA GTCTCCAAGC 

101 CAAGTTCAGG CTCACTGACC TGGCTCTGAC GAGGACCCCA GGCCACTCTG 

T51 AGAAGACCTT GGAGTAGGGA CAAGGCTGCA GGGCCTCTTT CGGGTTTCCT 

1001 TGGACAGTGC CATGGTTCCA GTGCTCTGGT GTCACCCAGG ACACAGCCAC 
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1051 TCGGGGCCCC GCTGCCCCAG CTGATCCCCA CTCATTCCAC ACCTCTTCTC 
1101 ATCCTCAGTG ATGTGAAGGT GGGAAGGAAA GGAGCTTGGC ATTGGGAGCC 
1151 CTTCAAGAAG GTACCAGAAG GAACCCTCCA GTCCTGCTCT CTGGCCACAC 
1201 CTGTGCAGGC AGCTGAGAGG CAGCGTGCAG CCCTACTGTC CCTTACTGGG 
5 1551 GCAGCAGAGG GCTTCGGAGG CAGAAGTGAG GCCTGGGGTT TGGGGGGAAA 
1301 GGTCAGCTCA GTGCTGTTCC ACCTTTTAGG GAGGAT ACTG AGGGGACCAG 
1351 GATGGGAGAA TGAGGAGTAA AATGCTCACG GCAAAGTCAG CAGCACTGGT 
1H01 AAGCCAAGAC TGAGAAATAC AAGGTTGCTT GTCTGACCCC AATCTGCTTG 
mSl AAACCTGACT CTGCTTCTCT CATTTGTCTT CCTACCCTAC TCACATA ATT 

10 1501 CACTCATTGA CTCACTCATT CACCAGATAT TTATTGACCT GCTATTATAA 
1551 GCTTTACATC CTCCCATGTT GTCCTGGCAT GTGCAGTATA CACGGTCTAA 
IbOl CTCATCTCTC CCCAGATCTC TCAGAACCTT GAGCTTGGGA ATTGAACTGG 
IbSl GGTCACCTGT GTCGTTTCTT ATGGACTCGC AGGATTTTAG AACCCTAATG 
17Q1 CACCCTGGAG GGTAGCTGGG CCAGACTTCT CATTTCACAG GTGAGGAGAC 

15 1751 TGGTGCCCCA CAGGGATTAA GTGCCTTGCC CAAGGTCAGG CTTATCTCCA 
IfiOl GAGGGAGGTG CCCTGGACTG GGGCCCAGAT GTTCAGGGAC CCTGCCTACA 
1551 CCTCATTTCC AGTGTGGGCT GCCTTAGTTA GTTATGAGA A CAGGGAAGGG 
MDl CTGGGAAGAG ACAGCCTCCA AGGTCAACAC TTGGAGAGGG TTTCACTTGC 
1T51 TCTGAAGACC CTGGTCCAGG ATTCGCCCTC TCCCATGCCT TCAAGTCAGC 

20 2001 ATCAGGCTTA GGGCAAAGAC CAGGCCTCTG AAGCTGCCTC TTGTAATTCA 
2051 TGCAGGAAGA TGTCAAAGTC AGCCCCATCT TGGCTGATCA GGGTGTTCAG 
2101 CCTTAACCCC ACCTGTGTTC TGAAGTCTCT TACCCTACCT GCTCAGGACT 
E151 GAGACAGTTA TTCACTGAAC ATATTTATTA AGCACTTGCT GTAGGCCAAC 
2201 AGTTAAGAAT CCA ATAATGA AATGGACAGA TTCATGGAAC TTAGAGTCCA 

25 2251 ATAGGAAAGT GAGACCCAGA CAATGACAAT GAGATAAATG TTAGGAAGGG 
2301 GGAGGTATGG GGTGACTTCC CTGCAGTCCT GGGGGCCTAC ATGGGCCCAA 
2351 GACTGGGTGA GAGTCTTGGC AGAGCCTTTG CAACACCTTA AGTGGACAGG 
2L401 ACTGGGAGGT CTTGGTGGTT GGAGCCAACG TGGGTTCCCT GCGGCTCCTT 
2M51 AGTCACCTCT GATAGCAGAT TGAGGGAGGA AA ACAGGTA A GGCATGAGGA 

30 2501 AATGGCCAGG TTGGGTTAAC CCACTGGTTT CAACCAGTTC AGGAATGAGG . 
2551 TTATTTGGCC ATGACTGGCT GATCTTGAGC TCAAGGATCT GCTTCAA ATG 
2b01 CACACAGGCC TAGTTGAAGT TT A A ACCCC A GCA A A ACATT CCTCCCTGTA 
2b51 AATGGAAAAT CCTACTTCTA CCCCCACCCT GCCCTGTTTT TTGTTTTTTT 
2701 TTTCCCCAAG ATCATTAGAT GTCCTCACCC CTCCTCACTG CCTCTCCTCT 

35 2751 CTGGGACAGG CTGGGACCTT TGAGGA AGAT AA AGCCTTCC TTGACTACCC 
2fl01 ATCATATTCA GTGTCCCTGT TCCTCACTCA GAGAGGAAGG CAGAACCAGT 
2A51 CAGGCTTATT TCAGTAAGTT CCACAGTTCT ACAAGACTGC AGGAATTCTC 
2*101 CTTAAGGGAG GAGAGCAAGC AGGTGTGGCC CCAGCTTCTG GAAATGGCAG 
2=151 AAGAGAGGGT TTTCTCATTG AATGGGGGTG GGGGCTCGTG TGTCCTGGGA 

40 3001 AACCCCATCA GTCCCTTCAT TTCTTGAGAC TCAACTCCTG GGAGGAGAGG 
3051 GTCTCA AGAG TTGTCCCTGG AAGGAGGGCG GGGGCAGTCT GCATCTATTT 
3101 CAGGTTGTGG CTCTTGGTTC TAGGACTCTT ACTTCTCTGG CTAAGGGCTC 
3151 AGCTTCTTGG GACTTCAACC ATCTTCTTTC TGAAAGACCA AATCTAATGT 
3201 AACCAGTAAC GTGAGGACTG CCAAGTATGG CTTTGTCCCT ATGACTCAGA 

45 3251 GGAGGGTTTG TCGGGCAAAT TCAGGTGGAT GAAGTATGTG TGTGCGTGTG 
3301 CATGGGAGTG TGCGTGGACT GGGATATCAT CTCTACAGCC TGCAAATAAA 
3351 CCAGACAAAC TTAAAAAAAA AAAAAAAAAA A 

50 BLAST Results 



Entry AB01230<U1 from database TREI1BL: 

product: "allograft inflammatory factor-l"; Cyprinus carpio mRNA 
55 for 

allograft inflammatory factor-li complete cds- 

Score = 575-, P = 3.7e-5iJn identities = 113/mt-, positives = 

12fi/lMt.-. 
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5 Medline entries 



No Medline entry 

10 

Peptide information for frame 2 



15 ORF from AT bp to 53fl bp^ peptide length: isq 
Category: strong similarity to known protein 
Classification: unclassified 

1 MSGELSNRF<2 GGKAFGLLKA RdERRLAEIN REFLCDGKYS DEENLPEKLT 
20 SI AFKEKYMEFD LNNEGEIDLM SLKRMMEKLG VPKTHLEMKK MISEVTGGVS 

101 DTISYRDFVN I1MLGKRSAVL KLVMMFEGKA NESSPKPVGP PPERDIASLP 



•25 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_l jlTi frame 2 

30 

No Alert BLASTP hits found 

Pedant information for DKFZphamy2_l jlT frame 2 



35 

Report for DKFZphamyE_l jH - 2 



ELENGTHD 15D 
40 EMU J 17Db7.Bfe 
EpI3 b-b3 

EH0M0L3 TREMBL: ABD123D C 1_1 product: "allograft inflammatory 

factor-l n i Cyprinus carpio mRNA for allograft inflammatory 
factor-li complete cds- 26-5^ 
45 EFUNCATD 3D - DH organization of cytoskeleton ES. cerevisiae-. 

YBRlOTc J Se-DM 

EFUNCATJ 03- D7 pheromone response! mating-type determinationi 
sex-specific proteins IS- cereyisiaei YBRlOTcJ Se-OH 
EFUNCATD Ofl.n cellular import IS- cerevisiae-. YBRlO^cl Se-DH 
50 EFUNCATJ 10-02.^ other morphogenet ic activities ES. 
cerevisiae-, YBRlD^cl Se-OM 

EFUNCATD D3-22 cell cycle control and mitosis ES - cerevisiae-i 
YBRlCHcD Se-DM 

EFUNCATD 03-DH budding-, cell polarity and filament formation 
55 ES- cerevisiaei YBRlO^cD 5e-DM 

EFUNCATD 03-01 cell growth ES. cerevisiae-. YBRlQ^cD Se-OH 
EFUNCATD 3D-D5 organization of centrosome ES. cerevisiaei 
YBRlD^cD Se-OM 
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ESC0P3 d2mysb_ 1-37.1.5-15 Myosin Essential Chain Myosin 

Regulatory Chai 5e-20 

QISCOPID dlwdcb_ 1.37-1-5-m Myosin Essential Chain Myosin 

Regulatory Chai 3e-D5 

5 CSC0P3 dlosa 1-37. 1*5-13 Calmodulin IE (Paramecium 

tetraurelia) 3e-lfc> 

ISCOPJ dlauib_ 1 - 37 • 1 - 5- 11 Calcineurin regulatory subunit 

(B-chain Ee-lb - 

CPIRKUJ duplication 7e-0b 

10 EPIRKIO mitosis 7e-Db 

CPIRKbO calcium binding 7e-0b 

EPIRKbD EF hand 7e-0b 

EPIRKtO cell division 7e-Db 

ESUPFAM3 unassigned calmodulin-related proteins 3e-M7 
15 ESUPFAMID calmodulin ?e-E3fc, 

ESUPFAM31 calmodulin repeat homology 3e-H7 
CKU1 All_Alpha 
EKIO 3D 

20 

SE<2 MS6ELSNRFl3GGKAFGLLKAR(2ERRLAEINREFLC3>(3KYSDEENLPEKLTAFKE)CYMEF]> 
Ictr- 

• - HHHHHHHHHHHHHT 

25 SE<3 LNNEGEIPLMSLKRMMEKLGVPKTHLEMKKMISEVTGGVSDTISYRPFVNMMLGKRSAVL 
Ictr- 

TTTTTCBCHHHHHHHHHHTTTCCCHHHHHHHHHHCTTTTCCCBCHHHHHHHHCCTTTHHH 

SE<2 KLVMMFEGKANESSPKPVGPPPERDIASLP 
30 Ictr- HHHHHHTTTTC 



(No Prosite data available for DKFZphamyE — 1 jn • 2 ) 
35 (No Pfam data available for DKFZphamy2_JL jll . 2 ) 
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5 group : cell cyle 

DKFZphamy2_2 l 4bM encodes a novel b^fl amino acid protein with 
similarity to human STIMl. 

10 The stromal interaction molecular 1 gene (STIMl) encodes a type I 
trans-membrane protein of unknown function! which induces growth 
arrest and degeneration of the human tumor cell lines GM01 and RD 
but not HBL1D0 and CaLu-bn suggesting a role in the pathogenesis 
of rhabdomyosarcomas and rhabdoid tumors- There is also strong 

15 similarity to a Plus musculus stromal cell protein-, which 

selectively increases interleukin 7-dependent proliferation of 
pre-B cells- The novel protein contains 1 transmembrane domain- 

The new protein can find application in modulation of tumour 
20 growth - 

similarity to STIMl (Homo sapiens) 

25 probably differential polyadenylation * cf- EST-BLAST file- 
perhaps complete cds- 

Pedant: SIGNAL_PEPTIDE and TRANSMEMBRANE 1 



30 



Sequenced by GBF 

Locus: /map="13 c l.S cR from top of ChrH linkage group* 



Insert length: 3305 bp 

Poly A stretch at pos- 327 1 *-. polyadenylation signal at pos- 32b0 
35 i 

1 GGCGCCTTCA TCCCGCCTCG ACTCCTGGCC CAGCGTGGGG CTGGCTGCTG 

51 CGGCGGCGGC GCTGGGCTGC GTTGCTGGTG CTCGGGCTGC TGGTACCCGG 

101 AGCGGCGGAC GGATGCGAGC TTGTGCCCCG GCACCTCCGC GGGQGGQGGG 

40 151 CGACTGGCTC TGCCGCAACT GCCGCCTCCT CTCCCGCCGC GGCGGCCGGC 

2D1 GATAGCCCGG CGCTCATGAC AGATCCCTGC ATGTCACTGA GTCCACCATG 

251 CTTT ACAGA A GAAGACAGAT TTAGTCTGGA AGCTCTTCAA ACAATACATA 

3D1 AACAAATGGA TGATGACAAA GATGGTGGAA TTGAAGTAGA GGAAAGTGAT 

351 GAATTCATCA GAGAAGATAT GAAATATAAA GATGCTACTA ATAAACACAG 

45 i*Dl CCATCTGCAC AGAGAAGATA AACATATAAC GATTGAGGAT TTATGGAAAC 

M51 GATGGAAAAC ATCAGAAGTT CATAATTGGA CCCTTGAAGA CACTCTTCAG 

5D1 TGGTTGATAG AGTTTGTTGA ACTACCCCAA TATGAGAAGA ATTTTAGAGA 

551 CAACAATGTC AAAGGAACGA CACTTCCCAG GATAGCAGTG CACGAACCTT 

LD1 CATTTATGAT CTCCCAGTTG AAAATCAGTG ACCGGAGTCA CAGACAAAAA 

50 b51 CTTCAGCTCA AGGCATTGGA TGTGGTTTTG TTTGGACCTC TAACACGCCC 

7D1 ACCTCATAAC TGGATGAAAG ATTTTATCCT CACAGTTTCT ATAGTAATTG 

751 GTGTTGGAGG CTGCTGGTTT GCTTATACGC AGAATAAGAC ATCAAAAGAA 

AD1 CATGTTGCAA AAATGATGAA AGATTTAGAG AGCTTACAAA CTGCAGAGCA 

651 AAGTCTAATG GACTTACAAG AGAGGCTTGA AAAGGCACAG GAAGAAAACA 

55 ^E31 GAAATGTTGC TGTAGAAAAG CAAAATTTAG AGCGCA AAAT GATGGATGAA 

=151 ATCAATTATG CAAAGGAGGA GGCTTGTCGG CTGAGAGAGC TAAGGGAGGG 

10D1 AGCTGAATGT GAATTGAGTA GACGTCAGTA TGCAGA ACAG GAATTGGAAC 

1D51 AGGTTCGCAT GGCTCTGAAA AAGGCCGAAA AAGAATTTGA ACTGAGAAGC 
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1101 AGTTGGTCTG TTCCAGATGC ACTTCAGAAA TGGCTTCAGT TAACACATGA 
1151 AGTAGAAGTG CAATACTACA ATATTAAAAG ACAAAACGCT GAAATGCAGC 
1201 TAGCTATTGC TAAAGA.TGAG GCAGAAAAAA TTAAAAAGAA GAGA AGCACA 
1251 GTCTTTGGGA CTCTGCACGT TGCACACAGC TCCTCCCTAG ATGAGGTAGA 
5 1301 CCACAAAATT CTGGAAGCAA AGAAAGCTCT CTCTGAGTTG ACAACTTGTT 
1351 TACGAGAACG ACTTTTTCGC TGGCAACAAA TTGAGAAGAT CTGTGGCTTT 
1401 CAGATAGCCC ATAACTCAGG ACTCCCCAGC CTGACCTCTT CCCTTTATTC 
1451 TGATCACAGC TGGGTGGTGA TGCCCAGAGT CTCCATTCCA CCCTATCCAA 
1501 TTGCTGGAGG AGTTGATGAC TTAGATGAAG ACACACCCCC AATAGTGTCA 

10 1551 CAATTTCCCG GGACCATGGC TAAACCTCCT GGATCATTAG CCAGAAGCAG 
IbOl CAGCCTGTGC CGTTCACGCC GCAGCATTGT GCCGTCCTCG CCTCAGCCTC 
IbSl AGCGAGCTCA GCTTGCTCCA CACGCCCCCC ACCCGTCACA CCCTCGGCAC 
1701 CCTCACCACC CGCAACACAC ACCACACTCC TTGCCTTCCC CTGATGCAGA 
1751 TATCCTCTCA GTGTCAAGTT GCCCTGCGCT TTATCGAAAT GAAGAGGAGG 

15 IflOl AAGAGGCCAT TTACTTCTCT GCTGAAAAGC AATGGGAAGT GCCAGACACA 
1851 GCTTCAGAAT GTGACTCCTT AAATTCTTCC ATTGGAAGGA AACAGTCTCC 
nOl TCCTTTAAGC CTCGAGATAT ACCAAACATT ATCTCCGCGA AAGATATCAA 
ITSl GAGATGAGGT GTCCCTAGAG GATTCCTCCC GAGGGGATTC GCCTGTAACT 
2001 GTGGATGTGT CTTGGGGTTC TCCCGACTGT GTAGGTCTGA CAGAAACTAA 

20 2051 GAGTATGATC TTCAGTCCTG CAAGCAAAGT GTACAATGGC ATTTTGGAGA 
2101 AATCCTGTAG CATGAACCAG CTTTCCAGTG GCATCCCGGT GCCTAAACCT 
2151 CGCCACACAT CATGTTCCTC AGCTGGCAAC GACAGTAAAC CAGTTCAGGA 
2201 AGCCCCAAGT GTTGCCAGAA TAAGCAGCAT CCCACATGAC CTTTGTCATA 
2251 ATGGAGAGAA AAGCAAAAAG GCATCAAAAA TCAAAAGCCT TTTTAAGAAG 

25 2301 AAATCTAAGT GAACTGGCTG ACTTGATGGA ATCATGTTCA AGTGGCATCT 
2351 GTAAACTATT ATCCCCCACC CTCCACTCCC CACCTTTTTT TTGGTTTAAT 
2401 TTTAGGAATG TAACTCCATT GGGGCTTTCC AGGCCGGATG CCATAGTGGA 
2451 ACATCCAGAA GGGCAACTGT CTACTGTCTG CTTATTTAAG TGACTATATA 
2501 TAATCAATTC ATCAAGCCAG TTATTACTGA AAAATCATTG AAATGAGACA 

30 2551 GTTTACAGTC ATTTCTGCCT ATTTATTTCT GCTTTGTTCT CAGTGATGTA 
2b01 TATGCAACAT TTTGTTGAAA GCCACGATGG ACTTACAAGC TTTA ATGGAC 
2b51 TCGTAAGCCA GCATGGGCTT GCA AAAATTT CTTGTTTACC AGAGC ATCTT 
2701 CTTATCTTTC CACAGAGCTA TTTACATCCT GGACTATATA ACTTAAAAGA 
2751 AGTAAAACGT AATTGCACTA CTGTTTTCCA GACTGGAAAA AAAAAAAAAT 

35 2fl01 CTCTGCAAGT GAAACTGTAT AGAGTTTATA AAATGACTAT GGATAGGGGA 
2A51 CTGTTTTCAC TTTTAGATCA AAATGGGTTT TTAAGTAGAA CCTAGGGTTT 
2^01 CTAATTGACT TGATTTCTGG AAATGA AAAC CCGCGCTTTT ATTATGGGAA 
2^51 GCTTCTTGAA CTGCATTTAC TATTGTGAAG TTTCAAGTCC CGCTGTAAAG 
3001 ATCATGTTGT TTTGTTTTCC CCAGGGCTTT CACTGTGATT TACTGCATTG 

40 3051 CAGGCTGTAT GATAAAACAC ACATAATTTA AAGAGAGAAG GCTCTTGATT 
3101 CCTTATGCAA GTGGAAGAGT TGAAACTTGA TTGAAGGACT TAAAACATTC 
3151 ACAACCTTAA GCCGAGGTGG GGGGATATGG GGATTCAGGC AGTTGTTT AC 
3201 ACACTTTGAA TAACTGCAAA GGATTTACGG TTTGTGAAAA ATGTGTACTG 
3251 TGGAAAAGAT AATAAATTGA AGACATTAAA AAAAAGAAAA AAAAAAAAAA 

45 3301 AAAAA 



BLAST Results 



50 

Entry HS5242tlO_l from database TREF1BL : 

gene: "STIMl^ product: "GOK" n Homo sapiens GOK (STIfll) mRNA-i 

complete 

cds- 

55 Score = 13=17-, P = 4.2e-142-» identities = 275/447-. positives = 
33b/447-, 
frame +3 
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Entry MMl_m7323_l from database TREMBL: 

product: "stromal cell protein"i Mus musculus stromal cell 
protein 

mRNA-i complete cds- 
5 Score = 13^-. P = a.&e-m2i identities = 27^/^^l^ positives = 
33b/MM7^ 
frame +3 

Entry HS^ITS^ from database EflBL: 
10 human STS EST1L.7M7T . 

Score = 13TQ-, P = T-le-57-, identities = 2flM/2fi7 



15 Medline entries 



c J7D7^b c 12: 

Parker NJ-i Begley CG-i Smith PJt Fox RM-=i Molecular cloning of a 
20 novel 

human gene (DllSMflTbE) at 

chromosomal region llpl5.5* Genomics ITTb Oct 15*37 (2) : 253-b 
c Jb32bbfl02 

25 Oritani K-i Kincade P11-; Identification of stromal cell products 
that 

interact 

with pre-B cells. J Cell Biol l=Hb Aug =, 13M ( 3 ) : 771-B2 



35 



40 



Peptide information for frame 3 



ORF from 21b bp to 230=1 bp 1 , peptide length: bHfi 
Category: similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: RGD (56=1-5=11) 



1 MTDPCMSLSP PCFTEEDRFS LEALl2TIHK<3 MDDDKDGGIC VEESDEFIRE 

51 DMKYKDATNK HSHLHREDKH ITIEDLWKRU KTSEVHNUTL EDTLdblLIEF 

101 VELPiSYEKNF RDNNVKGTTL PRIAVHEPSF MISSLKISDR SHR<2KL<2LKA 

45 151 LDVVLFGPLT RPPHNWMKDF ILTVSIVIGV GGCUFA YTtfN KTSKEHVAKM 

201 MKDLESLtfTA E(2SLMDL<2ER LEKAdEENRN VAVEKflNLER KMMDEINYAK 

251 EEACRLRELR EGAECELSRR Q YAEtfELEtfV RMALKKAEKE FELRSSUSVP 

301 DALflKbJL<2LT HEVEVflYYNI KR<2NAEM(2LA IAKDEAEKIK KKRSTVFGTL 

351 HVAHSSSLDE VDHKILEAKK ALSELTTCLR ERLFRUddlE KICGFdIAHN 

50 m]l SGLPSLTSSL YSDHSWVVMP RVSIPPYPIA GGVDDLDEDT PPIVSdFPGT 

M51 MAKPPGSLAR SSSLCRSRRS IVPSSPdPdR AflLAPHAPHP SHPRHPHHPfl 

501 HTPHSLPSPD PDILSVSSCP ALYRNEEEEE AIYFSAEKdU EVPDTASECD 

551 SLNSSIGRKS SPPLSLEIYd TLSPRKISRD EVSLEDSSRG DSPVTVPVSU 

bOl GSPDCVGLTE TKSMIFSPAS KVYNGILEKS CSMNflLSSGI PVPKPRHTSC 

55 b51 SSAGNDSKPV (2EAPSVARIS SIPHDLCHNG EKSKKPSKIK SLFKKKSK 
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BLASTP hits 

No BLASTP hits available 
5 Alert BLASTP hits f or DKFZphamy2_2MbH frame 3 

No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2Mb4 frame 3 



10 



Report for DKFZphan^^HbM . 3 



15 CLENGTH1 71^ 

EMtO fihb73.M^ 
CpIJ b.ti^ 

EHOMOLJ TREMBL:HS52M2blD_l gene: "STIM1"; product: n G0K"^ 

Homo sapiens GOK (STIM1) rnRNAi complete cds. le-ISM 
20 EBL0CKS3 BLODfifitC I>ihydroxy-acid and b-phosphogluconate 
dehydratases proteins 
CBLOCKSID PR00D51D 
EBLOCKSJ PR010S3F 

EBL0CKS3 BL0D72bB AP endonudeases family 1 proteins 
25 EPROSITEJ RGD 1 

EKIO SIGNAL_PEPTIDE 3fi 

EKliD TRANSMEMBRANE 1 

EKIO L0U_C0MPLEXITY 15-flfc, X 

CKU3 C0ILEJ>_C0IL fl-MS X 

30 

SEt3 RLHPASTPGPAWGULLRRRRWAALLVLGLLVPGA ADGCELVPRHLRGRRATGSA ATAASS 
SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhhhhhhhcccccccccccchhhhhhhcccccccccccc 
35 COILS 



MEM 



SEfl PAAAAGDSPALMTDPCMSLSPPCFTEEDRFSLEALflTIHKdMDDDKDGGIEVEESDEFIR 
40 SEG xxxxxxxxxx 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhcccccceeeecchhhhh 
COILS 



MEM 

45 

SE<2 EDMKYK3>ATNKHSHLHREl)KHITIEI>LWKRUKTSEVHNUTLEDTL(3LlLIEFVELPflYEKN 

^* c r 



PRD hhcccccccccccccccccceeeehhhhhhhhhhccccchhhhhhhhhhhhhhcccchhh 
COILS 

50 

MEM 

SEfl FRDNNVK6TTLPRIAVHEPSFMIS(3LKISDRSHR<3KL<2LKALDVVLFGPLTRPPHNUMKD 



55 PRD hhhhhcccccccegeeecccceeeGeecccchhhhhhhhhhheeeecccccccccccchh 
COILS 



MEM 
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SE<2 FILTVSIVIGVGGCUFAYTi2NKTSKEMAKMMKDLESL(2TAE<2SLMDL<3ERLEKA<3EENR 
SEG 

PRD hhheeeeeeccccceeeecccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 
5 COILS 

ccccccccccccccccccccccccccccccccc 

riEM nnririfinnfinriritirinrirni 

SEfl NVAVEK<2NLERKMMDEINYAKEEACRLRELREGAECELSRRi2YAE(2ELE<2VRMALKI<:AEK 
10 SEG . . 

PRD ceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

15 

SEfl EFELRSSUSVPDAL<2KUL(2LTHEVEV(2YYNIKR<3NAEMl2LAIAKDEAEKIKKKRSTVFGT 
SEG ••••••«. xxxxxxxxxxxxx • 

PRI> hhhhhccccccchhhhhhhhhhhheeeeccchhhhhhhhhhhhhhhhhhhhhhhhhccce 
COILS 

20 

MEM 

SE(3 LHVAHSSSLDEVDHKILEAKKALSELTTCLRERLFRbl(3t3IEKICGFi3IAHNSGLPSLTSS 
SEG 

25 PRD eeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcceeeeccccccceee 
COILS 



MEM 



30 SEfl LYSDHSUVVMPRVSIPPYPIAGGVDDLDEDTPPIVSflFPGTMAKPPGSLARSSSLCRSRR 

xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



35 MEM 



SEU SIVPSSPt2P(2RAi3LAPHAPHPSHPRHPHHPl2HTPHSLPSPDPDILSVSSCPALYRNEEEE 
SEG X xxxxxxxxxxxxxxxxxxxxxxxxx xxxx 

PRD eeecccccccccccccccccccccccccccccccccccccccceeeeeecccchhhhhhh 
40 COILS 



MEM 



SEfl EAIYFSAEK(JldEVPDTASECDSLNSSIGRKl3SPPLSLEIYl2TLSPRICISRDEVSLEDSSR 
45 SEG x- 

PRD hhhhhhhhhhcccccccccccccccccccccccceeeeeeeecccccccccccccccccc 
COILS 

MEM ! ! '. '. \\ 

50 

SEC! GDSPVTVDVSWGSPDCVGLTETKSMIFSPASKVYNGILEKSCSMNflLSSGIPVPKPRHTS 
SEG . . 

PRD cccceeeeeccccccccceeeccccccccccceeeeeeeccccccccccccccccccccc 
COILS 

55 

MEM . 



SECJ CSSAGNDSKPVOEAPSVARISSIPHDLCHNGEKSKKPSKIKSLFKKKSK 
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SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccceeeecceeeeeccccccccccccccccccceeeeeecccc 

COILS - 

MEM 

5 



Prosite for DKFZphamy2_St*b4 . 3 
10 PSOOOlb bbD->bb3 RGD PDOCOODlb 

(No Pfam data available for DKFZphamy2_2HbM .3) 
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DKFZphamyE_Si4cA 



PCT/IB01/02050 



5 group: transmembrane protein 

DKFZphamy2_2Mcfl encodes a novel M5M amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 1 transmembrane region. 

No informative BLAST results 1 . No predictive prositei pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells- 



20 putative protein 

EST of GEN-M2bH07 is 1M1 Bp longer at 5 p -end 
perhaps complete cds- 
Pedant: TRANSMEMBRANE 1 

25 

Sequenced by GBF 

Locus: /map="tD l 1.7 cR from top of Chr3 linkage group" 

30 Insert length: 3200 bp 

Poly A stretch at pos. 3177-1 pblyadenylation signal at pos- 315b 

1 CCTGTCCACA GGGCCCGCTC CAGCAGCCAT GGCAACCACA TCCTCCAAGC 

35 51 CAGAGGGCCG CCCTCGAGGG CAGGCTGCCC CCACCATCCT GCTGACAAAG 

1D1 CCACCGGGGG CCACCAGCCG CCCCACCACA GCGCCCCCCC GCACTACCAC 

151 ACGCAGGCCC CCCAGGCCCC CAGGCTCTTC CCGAAAAGGG GCTGGTAATT 

201 CATCACGCCC TGTCCCGCCT GCACCTGGTG GCCACTCCAG GAGTAA AGAA 

251 GGACAGCGAG GACGAAATCC AAGCTCCACA CCTCTGGGGC AGAAGCGGCC 

40 301 CCTGGGGAAA ATCTTTCAGA TCTACAAGGG CAACTTCACA GGGTCTGTGG 

351 AACCGGAGCC CTCTACCCTC ACCCCCAGGA CCCCACTCTG GGGCTACTCC 

401 TCTTCACCAC AGCCCCAGAC AGTGGCTGCG ACCACAGTGC CCAGCAATAC 

MSI CTCATGGGCA CCCACCACCA CCTCCCTGGG GCCTGCAAAG GACAAGCCAG 

501 GCCTTCGCAG AGCAGCCCAG GGGGGTGGTT CTACCTTCAC CAGCCAAGGA 

45 551 GGGACACCAG ATGCCACAGC AGCCTCAGGT GCCCCTGTCA GTCCACAAGC 

b01 TGCCCCAGTG CCTTCTCAGC GCCCCCACCA CGGTGACCCA CAGGATGGCC 

b51 CCAGCCATAG TGACTCTTGG CTTACTGTTA CCCCTGGCAC CAGCAGACCT 

7D1 CTGTCTACCA GCTCTGGGGT CTTCACGGCT GCCACGGGGC CCACCCCAGC 

751 TGCCTTCGAT ACCAGTGTCT CAGCCCCTTC CCAGGGGATT CCTCAGGGAG 

50 BDl CATCCACAAC CCCACAAGCT CCAACCCATC CCTCCAGGGT CTCAGAAAGC 

651 ACTATTTCTG GAGCCAAGGA GGAGACTGTG GCCACCCTCA CCATGACCGA 

1D1 CCGGGTGCCC AGTCCTCTCT CCACAGTGGT ATCCACAGCC ACAGGCAATT 

=151 TCCTCAACCG CCTGGTCCCC GCCGGGACCT GGAAGCCTGG GACAGCAGGG 

1001 AACATCTCCC ATGTGGCCGA GGGGGACAAA CCGCAGCACA GAGCCACCAT 

55 1051 CTGCCTGAGC AAGATGGATA TCGCCTGGGT GATCCTGGCC ATCAGCGTGC 

1101 CCATCTCCTC CTGCTCTGTC CTGCTGACGG TGTGCTGCAT GAAGAGGAAG 

1151 AAGAAGACCG CCAACCCGGA GAACAACCTG AGCTACTGGA ACAACACC AT 

1201 CACCATGGAC TACTTCAACA GGCATGCTGT GGAGCTGCCC AGGGAGATCC 
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1E51 AGTCCCTTGA AACCTCTGAG GACCAGCTCT CAGAGCCCCG CTCCCCAGCC 
1301 A ATGGCGACT ATAGAGACAC TGGGATGGTC CTTGTTAACC CCTTCTGTCA 
1351 AGAAACACTG TTTGTGGGAA ACGATCAAGT ATCTGAGATC TAACTACAGC 
1M01 AGGCATCACT TTGCCATTCC GTATTTTTCG TCTCTAAATT ATAAATATAC 
5 1MS1 AAATATATAT ATTATAAATA TAACCTTTGT GTAACCCTGA CTTAATGAGA 
1501 AACATTTTCA GCTTTTTTTC CTATGAATTG TCAACATCTT TTTTACAAGT 
1551 GTGGTTTAAA AAAAAAAAAA CTTTACAGAA TGATCTGTGG CTTTATAAAA 
IbOl TAAAGGTATT TCTAAGCAAA GCAGTTGCAT TGATTGCTTC TCTTAATAAC 
lt,51 TATTCTTGAG CACCTGGGGA TCCCAGGAAC CCTGGTCAGG TGAGGTAAGA 

10 1701 GACTGACCTC CTGTAGAAGC TGA ATGTTAC AGTGGTCAAG CGCACGATTC 
1751 TTTGAGTGAT TCTTAAAGCT CTGGTTCCTC TTGATTTGGT GTGACCCCAT 
IflOl TTCCTCCCTT CTCATACGCA CACCTGTAAA GGGAACTGGA CCGCCTCAGG 
• 1851 GGAAGACGGC AGACTCATGC ACAGAGAAGG AAAAGGGAAC ATCTCATCAC 
^1301 CTCTGAGGAT GAGTACCCTG GAGCCTTATG ACGGCACCAT TGGATGTCAT 

15 nSl GTTTAATTCC ATCCAAGTTG TGGATGGCAG GCAGGAGCAT GGAGCCCTCA 
5001 GGAATCCATG GAGGACATCA AGGCATCCCA AGGCCATATT CCCCTAACAT 
E051 TACTTCCACT GCTAACAACA GGACTGCCTT TCCCTGGTGG GAAAATGCTC 
S1D1 CCTTTATGCC CATTCCTGTA TCCCCTCCAA CACCCACATC TGCATTAAAC 
E151 ACCCGTGCCT TTCTCTTGGA GAGGGTTTAG ATGCAGATCC CGGCCCTGGA 

20 ESDI GCTTTAAAAT GCTTGCCCTT CCTTCTTCAA GGATCAAATG TTTATTGGGG 
EE51 TTCAGCTTTG TTTTCTCAAA AGGCCATGGT ATCGTGCCCC TGAGGAACAT 
E301 GTTTATCTA A GAAGCTTTGA GGTAGTAGAG CGATAATTTT TGAAACCTTC 
E351 CTCCTGCAAT CTTTAAAAAA GAAAAAAAAG ATTGCCCA A A CAAATCATTT 
E^01 GGGAGAAGAC ATCATTATAC TCCTACTTGG CACTGCAAAC CTGCTCGCAG 

25 Ei*Sl CACCAGCCGG TGGACTTGCC ATCCAGCTCT CAGCTTCCAC TGCTCCCCTT 
E501 GTTCCCGGCC GGCTGGCTGC CTCCCCGTGC TGTGTCCAGC ACGGCCAACA 
E5S1 ACGTCAGACC CTCAGAGACG CCCAAGGGGC TTCCAGAGGT GGCCGCTTCT 
EbOl CTATTTTTTC CTGATTGTGG CTGAGAGAGA TGATTACTGC TTTGACACTT 
Eb51 CCTTTCTCTA AAAGA A AAAT AGTTTGATAG TATATTTTGA ATATAGATGC 

30 E701 TCTTATAGTC AGATTGGGAA TTGAACTTGA ATATTGGGTC ATATGTTTGT 
E751 GTTGTTGCTG TAGTCTATCA TGACTTTTTT CTTTCTGCAT TTTCCTTAAA 
EflOl AAAAAAAAAA AGATGGCCTT CAAA AGTGTG TTCTCAATGT TGTATGAACC 
Efi51 TCCTTCACAT GAGTTCGGTT GTTGTCTCTC TTCAAAGACT CTTCAACCCA 
E^Ol CAAAGAAGCA ACTAAATGTT TCTCTAAGTT TAATTTTCTA GCGTGTTGTT 

35 ST51 GTCTTACCTT TTTAACCTTA CCATAATATT TCTGTTA ACT GTTACATTTA 
3001 ATATACCAAT GTGTGTAAGT ATACAGAGAA AAATCTGTTT GTAAAGTAAA 
3051 ATTTATATAT AATATATGTA ATCAAAGATA CATATGTTAT ATATACATAT 
3101 GTGGATGTAT GACTTATTTT TCCTTATCCA CAGATTTCAG CTACCATGTA 
3151 TATATAAATA AACTTATTTT ATTAGCCAGA GAAAAAAAAA AAAAAAAAAA 

40 

BLAST Results 



45 No BLAST result 



Medline entries 



50 

No Medline entry 



55 Peptide information for frame E 



ORF from ET bp to 13^D bpn peptide length: M5M 
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Category: putative protein 

Classification: Transmembrane proteins unclassified 

1 MATTSSKPEG RPRGQAAPTI LLTKPPGATS RPTTAPPRTT TRRPPRPPGS 
5 51 SRKGAGNSSR PVPPAPGGHS RSKEGflRGRN PSSTPLGflKR PLGKIFdlYK 

1D1 GNFTGSVEPE PSTLTPRTPL lilGYSSSPC?P<3 TVAATTVPSN TSUAPTTTSL 
151 GPAKDKPGLR RAAflGGGSTF TSGGGTPDAT AASGAPVSPfl AAPVPS(2RPH 
201 HGDPtfDGPSH SDSULTVTPG TSRPLSTSSG VFTAATGPTP AAFDTSVSAP 
251 SflGIPdGAST TPdAPTHPSR VSESTISGAK EETVATLTUT DRVPSPLSTV 
10 301 VSTATGNFLN RLVPAGTWKP GTAGNISHVA EGDKP(2HRAT ICLSKriDIAU 
351 VILAISVPIS SCSVLLTVCC MKRKKKTANP ENNLSYUNNT ITNDYFNRHA 
M01 VELPREIflSL ETSEDcJLSEP RSPANGDYRD TGflVLVNPFC OETLFVGNDfl 
451 VSEI 

15 

BLASTP hits 

No BLASTP hits available 

20 

Alert BLASTP hits for DKFZphamy2^2Mca -. frame 2 
No Alert BLASTP hits found 
25 Pedant information for DKFZphamy2_2 l 4ca ■» frame 2 

Report for DKFZphamy2_24cfl -2 

30 

CLENGTH3 41,3 

Cniill 48277-34 . 

EpIJ 1-fiO 

CFUNCAT3 18 classification not yet clear-cut ITS • cerevisiae-i 

35 YJR151c3 2e-04 

CBL0CKS3 PRD0 c il2F 

CBL0CKS3 BPOSblbF 

IKUO TRANSMEPIBRANE 1 

EKIO L0U_C0MPLEXITY 15-55 V. 

40 

SE<2 LSTGPAPAAMATTSSKPEGRPRGfJAAPTILLTKPPGATSRPTTAPPRTTTRRPPRPPGSS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccchhhhhhhhcccccccccccccceeeeccccccccccccccccccccccccccccc 
45 MEM 

SE<2 RKGAGNSSRPVPPAPGGHSRSKE6(2RGRNPSSTPLG(2KRPLGKIFfiIYKGNFTGSVEPEP 
SEG x 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeeecccccccccccc 
50 MEM 

SEU STLTPRTPLUGYSSSP<2P<2TVAATTVPSNTSh)APTTTSLGPAKDKPGLRRAA<3GGGSTFT 

SEG ••xxxxxxxx 

PRD ccccccccccccccccccceeeeeecccccccccccccccccccccccceeecccccccc 

55 MEM . 

SEA S(2GGTPDATAASGAPVSP<3AAPVPS(2RPHHGDP<3DGPSHSDSbJLTVTPGTSRPLSTSSGV 
SEG xxxxx....xxxxxxxxxxxxxxxxx - 
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PRD ccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccce 

MEM - 

SE(2 FT AATGPTPAAFDTSVSAPS(3GIP(2GASTTP(3APTHPSRVSESTISGAKEETVATLTnTD 

5 SEG xxxxxxxxxxxxxx - 

PRD eeeeccccccccccccccccccccccccccccccccccceeeeeecccchhhhhhhcccc 

MEM 

SEtf RVPSPLSTVVSTATGNFLNRLVPAGTUKPGTAGNISHVAEGDKP(3HRATICLSKri]>IAUV 

10 SEG 

PRD ccccccceQeeeccccccccccccccccccccccceeecccccccceeeccccchhhhhh 

riEii n 

SEt2 ILAISVPISSCSVLLTVCCnKRKKKTANPENNLSYUNNTITMDYFNRHAVELPREI(2SLE 

15 SEG . 

PRD hhhhccccccccceeeehhhhhhccccccccccccccccccccccccccccccchhhhhc 

men rinnnnriririfinnnrinrin 

SEC TSED(2LSEPRSPANGDYRDTGMVLVNPFC<2ETLFVGND<2VSEI 

20 SEG 

PRD cccccccccccccccccccceeeeecccccceeeeeccccccc 

riEn • . • — 

25 (No Prosite data available for DKFZphamyE_E t 4cfl - E) 

(No Pfam data available for DKFZphamyE_Ei4cfl . E ) 
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DKFZphamyE_Si4kl5 



5 group: amygdala derived 

DKFZphamyE^EHklS encodes a novel B7T amino acid protein with weak 
similarity to pecanex of Drosophila melanogaster - 

10 Pecanex is a maternal-effect neurogenic gene-i involved in 

differentiation processes in the developing central nervous 
system- DKFZphamyE^EMklS *p3 seems to be expressed ubiqui t iously • 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells. 



20 



similarity to pecanex (Drosophila melanogaster) 
probably complete cds- 
Sequenced by GBF 
25 Locus: unknown 

Insert length: mbH bp 

Poly A stretch at pos. mMSi polyadeny lat ion signal at pos- 1HE1 

30 

1 AAGGAA A ACA AGAGGACATG CCATAT ATTC CTCTCATGGA GTTCAGTTGT 

51 TCACATTCTC ACTTAGTATG CTTACCCGCA GAGTGGAGGA CTAGCTGTAT 

101 GCCCAGTTCC AAAATGAAGG AGATGAGCTC GTTATTTCCA GAAGACTGGT 

151 ACCAATTTGT TCTAAGGCAG TTGGAATGTT ATCATTCAGA AGAGAAGGCC 

35 EDI TCAAATGTAC TGGAAGAAAT TGCCAAGGAC AAAGTTTTAA AAGACTTTTA 

E51 TGTTCATACA GTAATGACTT GTTATTTTAG TTTATTTGGA ATAGACAATA 

301 TGGCTCCTAG TCCTGGTCAT ATATTGAGAG TTTACGGTGG TGTTTTGCCT 

351 TGGTCTGTTG CTTTGGACTG GCTCACAGAA AAGCCAGAAC TGTTTCAACT 

401 AGCACTGAAA GCATTCAGGT ATACTCTGA A ACTAATGATT GATAAAGCAA 

40 H51 GTTTAGGTCC AATAGAAGAC TTTAGAGAAC TGATTAAGTA CCTTGAAGAA 

501 TATGAACGTG ACTGGTACAT TGGTTTGGTA TCTGATGAAA AGTGGAAGGA 

551 AGCAATTTTA CAAGAAAAGC CATACTTGTT TTCTCTGGGG TATGATTCTA 

t,01 ATATGGGAAT TTACACTGGG AGAGTGCTTA GCCTTCAAGA ATTATTGATC 

LSI CAAGTGGGAA AGTTAAATCC TGAAGCTGTT AGAGGTCAGT GGGCCAATCT 

45 701 TTCATGGGAA TTACTTTATG CCACAA ACGA TGATGAAGAA CGTTATAGTA 

751 TACAAGCTCA TCCACTACTT TTAAGA AATC TTACGGTACA AGCAGCAGAA 

flOl CCTCCCCTGG GATATCCGAT TTATTCTTCA AAACCTCTCC ACAT ACATTT 

551 GTATTAGAGC TCATTTTGAC TGTAATGTCA TCAAATGCAA TGTTTTTATT 

^01 TTTTCATCCT AAAAAAGTAA CTGTGATTCT TGTAACTTGA GGACTTCTCC 

50 ^51 ACACCCCCAT TCAGATGCCT GAGAACAGCT AAGCTCCGTA AAGTTGGTTC 

1001 TCTTAGCCAT CTTAATGGTT CTAAAAAACA GCAAAAACAT CTTTATGTCT 

1051 AAGATAAAAG AACTATTTGG CCAATATTTG TGCCCTCTGG ACTTTAGTAG 

1101 GCTTTGGTAA ATGTGAGAAA ACTTTTGTAG AATTATCATA TAATGAATTT 

1151 TGTAATGCTT TCTTAAATGT GTTATAGGTG AATTGCCATA CAAAGTTAAC 

55 1E01 AGCTATGTAA TTTTTACATA CTTAAGAGAT AAACATATCA GTGTTCTAAG 

1E51 TAGTGATAAT GGATCCTGTT GAAGGTTAAC ATAATGTGTA TATATTTGTT 

1301 TGAAATATAA TTTATAGTAT TTTCAAATGT GCTGATTTAT TTTGACATCT 

1351 AATATCTGA A TGTTTTTGTA TCAAGTAGTT TGTTTTCATA GACTTCA ATT 
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20 



WO 01/98454 PCT7IB01/02050 

IMOl CATAAACTTT AAAAAACTTT TAATAAAATA TTTTCCTTCC TTTTCAAAAA 
1451 AAAAAAAAAA AAAA 



BLAST Results 



Entry ACD07^3T from database EMBLNEW : 

Homo sapiens clone 4EE_H_5-, DORKING DRAFT SEQUENCE-. 5 unordered 
10 pieces. 

Score = 411b-. P = O.Oe+DQi identities = fi4D/85fi 
3 exons 



Medline entries 

No Medline entry 

Peptide information for frame 3 



25 

ORF from 16 bp to ASM bp; peptide length: B7T 
Category: similarity to known protein 
Classification : unci ass if ied 

30 1 MPYIPLMEFS CSHSHLVCLP AEURTSCMPS SKMKEMSSLF PEDWYflFVLR 

51 (3LECYHSEEK ASNVLEEIAK DKVLKDFYVH TVMTCYFSLF GIDNMAPSPG 

1D1 HILRVYGGVL PUSVALDWLT EKPELFflLAL KAFRYTLKLM IDKASLGPIE 

151 DFRELIKYLE EYERDWYIGL VSDEKWKEAI LtfEKPYLFSL GYDSNMGIYT 

EDI GRVLSLC2ELL K2VGKLNPEA VRGdUANLSU ELLYATNDDE ERYSK3AHPL 

35 ESI LLRNLTVflAA EPPLGYPIYS SKPLHIHLY 



BLASTP hits 

40 

No BLASTP hits available 

Alert BLASTP hits for DKFZpharnyS_E4kl5 frame 3 
45 No Alert BLASTP hits found 

Pedant information f or DKFZphamyE_ E4kl5-» frame 3 

50 Report for DKFZph a my E_E 4 kl 5 • 3 

ELENGTHJ SflM 
EMU! 33Dbl3.31 
55 IpIJ 5.17 

CHOMOLJ TREMBL: AFDb7fc,Dfi_ll gene: "BD511 - IE" \ 

Caenorhabditis elegans cosmid BD511 . Ee-13 
EKWJ Alpha_Beta 
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SEt2 GK(3EDnPYIPLI1EFSCSHSHLVCLPAEURTSCriPSSKnKEI1SSLFPEDlJY(3FVLR(3LECY 
PRD ccccccccccceeeecccceeeeecccccccccccccccccccccccchhhhhhhhhhhh 



SEfl HSEEKASNVLEEIAKDKVLKDFYVHTVMTCYFSLFGIDNIIAPSPGHILRVYGGVLPUSVA 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeeeeeeeeeecccccccccceeeeeeccccccccc 

SE(2 LDULTEKPELF(3LALKAFRYTLKLf1Il>KASL6PIEDFRELIKYLEEYERDLJYIGL VSDEK 

10 PRD cchhhhhchhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhh 

SEd UKEAIL(2EKPYLFSLGYDSNriGIYTGRVLSL(3ELLIt2VGKLNPEAVR6(3lilANLSlJELLYA 

PRD hhhhhhhhcchhhhhhhhcccccchhhhhhhhhhhhheeeeechhhhhhhhhhh 

15 SEfl TNDDEERYSIt3AHPLLLRNLTV<2AAEPPLGYPIYSSKPLHIHLY 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccc 



(No Prosite data available for DKFZphamy2_2m<15 - 3 ) 

20 

(No Pfam data available for D<FZphamyE_Bm<15 -3) 



5 
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15 



20 



WO 01/98454 PCT/1B01/02050 
DKFZphamy2_2al3 



5 group: amygdala derived 

DKFZphamy2_2al3 encodes a novel M40 amino acid protein without 
similarity to known proteins- 

10 No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e • 



The new protein can find application in studying the expression 
profile of amygdala-specific genes. 

putative protein 

perhaps complete cds. 

Sequenced by MediGenomix 

Locus : /map = "ltipl3 • 3" 

25 Insert length: 25flM bp 

Poly A stretch at pos- 25L>2 -> polyadenylation signal at pos- 25M5 



1 GTTCCTGAGG ACGTGCTACG GGGGCAGCTT CCTGGTACAC GAGTCGTTCC 
30 51 TCTACAAGCG GGAGAAGGCT GTCGGGGACA AGGTGTATTG GACCTGCCGG 

101 GACCACGCGC TGCACGGCTG CCGGAGCCGG GCCATCACCC AGGGACAGCG 
151 GGTGACTGTG ATGCGTGGGC ACTGCCACCA GCCCG ATATG GAGGGCCTGG 
EDI A AGCCCGGCG GCAGCAGGAG A AGGCCGTGG AGACGCTGCA GGCTGGGCAG 
251 GACGGCCCTG GGAGCCAAGT GGACACGCTG CTCCGAGGCG TGGATAGTTT 
35 301- GCTCTACCGC AGGGGTCCGG GTCCCCTGAC TCTCACCAGG CCTCGGCCCA 

351 GAAAGCGAGC AAAGGTCGAA GACCAGGAGC TGCCAACCCA GCCCGAGGCC 
M01 CCAGACGAGC ACCAGGACAT GGACGCAGAC CCGGGAGGCC CTGAGTTCCT 
MSI GAAGACGCCC CTGGGGGGCA GCTTCCTGGT GTACGAGTCC TTCCTCTACC 
501 GGCGGGAGAA GGCGGCTGGG GAGAAGGTGT ATTGGACCTG CCGGGACCAG 
40 551 GCCCGCATGG GCTGCCGCAG CCGCGCCATC ACCCAGGGCC GACGGGTGAC 

bOl TGTCATGCGT GGTCACTGCC ACCCGCCCGA CCTGGGAGGC CTGGAGGCCC 
b51 TGAGGCAGCG GGAGA AACGC CCCAACACGG CGCAGCGGGG GAGCCCAGGC 
701 GCTGGCCTCT CTTTCCAGTG GCTCTTCCGG ATCCTGCAGC TTTTGGGTCA 
751 TGCTCCTGTG CTGCTGTGCC CCTCAGGGTC CTCCTGCCTC CCGAGCCTCC 
45 flDl CTGCTCCACA TGGCCCCTGC CCAGCCCTCT CCATCCCTCT TGAAGGAGGC 

851 CCCGAGTTCC TGAAGACGCC CCTGGGGGGC AGCTTCCTGG TGTACGAGTC 
101 CTTCCTCTAC CGGCGGGAGA AGGCGGCCGG GGAGAAGGTG TATTGGACCT 
151 GCCGGGACCA GGCCCGCATG GGCTGCCGCA GCCGCGCCAT CACCtAGGGC 
1001 CGGCGGGTCA TGGTCATGCG CAGGCACTGC CACCCACCGG ACCTGGGCGG 
50 1051 CCTGGAGGCC CTGCGGCAGC GGGAGCACTT CCCCAACCTG GCGCAGTGGG 
1101 ACAGCCCAGA TCCTCTCCGG CCCCTGGAGT TCCTGAGGAC TTCCCTGGGG 
1151 GGCAGGTTCC TGGTGCACGA GTCCTTCCTC TACAGGAAGG AGAAGGCGGC 
1201 TGGGGAGAAG GTGTACTGGA TGTGCCGGGA CCAGGCTCGG CTGGGCTGCC 
1251 GCAGCCGCGC CATAACCCAG GGCCACCGCA TCATGGTCAT GCGCAGCCAC 
55 1301 TGCCATCAGC CTGACCTGGC AGGCCTGGAG GCCTTGAGGC AACGGGAGCG 
1351 GCTCCCCACC ACGGCCCAGC AGGAGGACCC AGAAAAGATT CAAGTTCAGC 
1M01 TGTGCTTCAA GACGTGTTCT CCTGAAAGCC AGC AGATTTA TGGGGACATC 
1451 AAAGACGTCA GACTGGATGG CGAGTCCCAG TGAGGCGATG TGGGCAGAGG 
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WO 01/98454 ^PCT/IB01/020S0 

1501 AGCTCCGAGC CGCCCACCCA AGGTGGCTTC ACATCCACAC AGGCACTTCC 

1551 CATCCACCTA GGTTTGGCTT AGCAG A AACT TCTTTTCATT CTTCC A A AGC 

IbOl ATCGATGGTC TTCGCGTCTC CTCAGG AGGT CTCCCAGGAG GAATTCTTGG 

IbSl ATGGTGTCCT CATGTCGGCG GAGAACAGTG CTCAGAGCTG GCGCTTGC AG 

5 1701 ACGC AGCTGT CGTGGGGCAG GGCGGTGGCG CCTTCCTGAC CTTTGG A AGA 

1751 CATGACAAAG CTGCCTGGAC ACGGACGCCC CTGCTGTACG GCCACAGCAC 

IflOl CCCTGGGTTT GCAGAGCACG CAGCCTTCCT AGGGCTTTCC ACCTGGCG AG 

lfiSl GCCCCGCTCT GCTCAGCACG GTGC A A AGTG AATGCTGCTG TCTTGGAGCC 

1=101 TGGGCACGTT TGGGGAAGTT CCTGCTTCAA ACTGAGCTGC CCCGCATAGG 

10 1^51 CCAGGTCAAC CCACACCAAT CTTTTCTGGA CAGGTGCTGG GTAGGCCTTC 

2001 CTGGTCTCTG GCCGCCTGCT GCC AGGGTGT GGCCATCCCC AGCAACCGGA 

E051 GCCGGCCAAA CCAG AGGCCT CGCTCCGCAC TCCACACTTT CCTTTCTGTG 

E101 CTCCTTCCAA GTTAAATT A A ACCCCCTCTC CACGATTCCC ACGGCAGGCG 

2151 TCATTCCCGA GATGGGAGCC AGTCCAGGGG TCAGCAGGAG CCAGCGCTGG 

15 2201 GCACACGTGC CCTGGCTGAG GCC AGCGGCA TCCTGGGTGG CCCAGGTCCA 

2251 TCCTGGGCAG CAAAGGCGTG TCCCCTTCTG TCAGACAGCT TCACAGAGTG 

2301 TGGCTTCACC AGTC AGAGGG AGCAGTCCGG AGAGGCAAGA TGACCCCACC 

2351 GGGACTGCAG AGCCTCCTCC TTACTA AC A A GGACCTGTCC GCAGCCGCGA 

2M01 GGTCCTTCAC TCCCACCCTG TAATTGTGGG GGGAGTGCCA GCAACAGGCC 

20 2M51 TGTCCCCTGG CAAGTTGGCC ACGGAACCCA CCATGCACTG CAAGGCTGTG 

25D1 ACAGCCTGGG CACCCCTGCT TCTCCTCTGC TTGTACGGTT CCCCCAATAA 

2551 ATCCTATTTT CCATCAAAAA AAAAAAAAAA AAAA 

25 BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 



30 



35 



40 



Peptide information for frame 2 



ORF from Ibl bp to IMflO bp^ peptide length: MMD 
Category: putative protein 
Classification: no clue 

45 1 MRGHCHflPDM EGLEARRGtJE KAVETL(3AG(3 DGPGS(2VDTL LRGVDSLLYR 

51 RGPGPLTLTR PRPRKRAKVE DtfELPTdPEA PDEHt3DMDAD PGGPEFLKTP 

101 LGGSFLVYES FLYRREKAAG EKVYbJTCRDtf ARMGCRSRAI Tf3GRRVTVMR 

151 GHCHPPDLGG LEALRflREKR PNT AfiRGSPG AGLSF<2WLFR ILC3LLGHAPV 

201 LLCPSGSSCL PSLPAPHGPC PALSIPLEGG PEFLKTPLGG SFLVYESFLY 

50 251 RREKAAGEKV YUTCRD(3ARM GCRSRAIT(2G RRVMVMRRHC HPPDLGGLEA 

301 LRC3REHFPNL AdUDSPDPLR PLEFLRTSLG GRFLVHESFL YRKEKAAGEK 

351 VYUMCRDOAR LGCRSRAITfl GHRIMVMRSH CHGPDLAGLE ALR(3RERLPT 

M01 TA<3<2EDPEKI OVtfLCFKTCS PES(3(2IYGDI KDVRLDGESG 

55 

BLASTP hits 
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WO 01/98454 PCT/IB0]/02050 
No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2al3 frame 2 

5 No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2al3 frame 2 

10 Report for DKFZphamy2_2al3 • 2 

ELENGTHH «4^3 

EI1ie S5fiMD.13 

15 EpIJ ^-33 

EKIO Alpha_Beta 

EKIO L0U_C0riPLEXITY ^-21 Y. 

20 SEtf FLRTCYGGSFLVHESFLYKREKAVGDKVYyTCRDHALHGCRSRAIT(3G(3RVTVnR6HCHf3 

SEG 

PRD ccccccccceeeccchhhhhhhhhccceeeeecccccccccceeeeccceeeeeeccccc 

SEfl PDNEGLEARRflflEKAVETLGAGdDGPGSd VDTLLRGVDSLLYRRGPGPLTLTRPRPRKRA 

25 SEG f xxxxxxxxxxxxxxx • • o 

PRD cccchhhhhhhhhhhhhhhhhcccccccccccccccccceeeeecccceeecccccchhh 

SEd KVED(3ELPT(3PEAPDEH<2DMDADPGGPEFLKTPLGGSFLVYESFLYRREKAAGEKVYbJTC 

SEG 

30 PRD hhhhhcccccccccccccccccccccccccccccccceeeehhhhhhhhhhhccceeeec 

SEC RD(3ARriGCRSRAIT(3GRRVTVriRGHCHPPDLGGLEALRflREKRPNTAC?RGSPGAGLSFt3lJ 

SEG . 

PRD cchhhhhccceeecccceeeeeecccccccccchhhhhhhhhccccccccccccchhhhh. 

35 

SEC LFRILSLLGHAPVLLCPSGSSCLPSLPAPHGPCPALSIPLEGGPEFLKTPLGGSFLVYES 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhccceeeccccccccccccccccccccccccccccccccccccccceeeehh 

40 SEtf FLYRREKAAGEKVYUTCRD(3ARriGCRSRAIT(3GRRVriVnRRHCHPPDLGGLEALRc3REHF 

SEG 

PRD hhhhhhhhhccceeeeccchhhhhccceeecccceeeeeecccccccccchhhhhhhhhc 

SE(3 PNLAt2UDSPDPLRPLEFLRTSLGGRFLVHESFLYRKEKAAGEKVYWf1CRD<3ARLGCRSRA 

45 SEG 

PRD ccccccccccccchhhhhhhcccceeeeecchhhhhhhhccceeeecchhhhhhhccccc 

SE<2 IT(3GHRinvnRSHCH(3PDLAGLEALRC3RERLPTTA(3c3EDPEKI(3V(3LCFKTCSPES(3r3IY 

SEG 

50 PRD ccccceeeeeeccccccccchhhhhhhhhhhhhccccccccceeehhhhhcccccccccc 

SE(3 GDIKDVRLDGESfl 

SEG 

PRD ccccccccccccc 



55 



(No Prosite data available for DKFZphamy2_2al3 . 2 ) 
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WO 01/98454 PCT/1B01/02050 
(No Pfam data available for DKFZphamy E?_2a 13 - 2 ) 
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WO 01/98454 
DKFZphamyE_5bn 



PCT/IB01/02050 



5 group: differentiation/development 

DKFZphamyB^EblT encodes a novel 7fl c i amino acid protein which 
originates 

from TXBP151 mRNA by alternative splicing. 

10 

It is ubiquitously expressed- The mRNA is also subject to 
alternative polyadenylation. Overexpression of TXBP151 in NIH3T3 
cells causes inhibition of 

apoptosis induced by tumour necrosis factor (TNF). It binds to 
15 AB0-, which is 

also an inhibitor of cell death by a yet unknown mechanism. 

The new protein can find application in modifying/blocking 
apoptosic pathways and therefore serve as a tool in diagnosis of 
20 cancer predisposition and as a tool in cell culture- 



TXBP1S1-. differentially spliced 

25 differential splicing 

differential polyadenylation 

Sequenced by DediGenomix 

30 Locus: /map="7pl5" 

Insert length: 30E6 bp 

Poly A stretch at pos- EfiBS-, polyadenylation signal at pos. EflhB 

35 

1 GAAGAGGTTC GGCGGCTGAT GGCGGATCAG GATCGGAAGC CTGCGTAACT 
51 TTCTCCCTTG ATCCGGGAGT CTTTCCACTG GATTCACA AT GACATCCTTT 
101 CAAGAAGTCC CATTGCAGAC TTCCAACTTT GCCCATGTC A TCTTTCAA A A 
151 TGTGGCCAAG AGTTACCTTC CTAATGCACA CCTGGA ATGT CATTACACCT 
40 201 TAACTCCATA TATTCATCCA CATCCAAAAG ATTGGGTTGG TATATTCAAG 

E51 GTTGGATGGA GTACTGCTCG TGATTATTAC ACGTTTTTAT GGTCCCCTAT 
301 GCCTGA ACAT TATGTGGAAG GATCAACAGT CAATTGTGTA CTAGCATTCC 
351 AAGGATATTA CCTTCCAAAT GATGATGGAG AATTTTATCA GTTCTGTTAC 
MD1 GTTACCCATA AGGGTGAAAT TCGTGGAGCA AGTACACCTT TCCAGTTTCG 
45 M51 AGCTTCTTCT CCAGTTGAAG AGCTGCTTAC TATGGAAGAT GAAGGAAATT 

501 CTGACATGTT AGTGGTGACC ACAAAAGCAG GCCTTCTTGA GTTGAAAATT 
551 GAGAA AACCA TGAAAGAAAA AGAAGAACTG TTAAAGTTAA TTGCCGTTCT 
bDl GGAAAAAGAA ACAGCACAAC TTCGAGAACA AGTTGGGAGA ATGGAAAGAG 
b51 AACTTAACCA TGAGAAAGAA AGATGTGACC AACTGCAAGC AGAACAAAAG 
50 7D1 GGTCTTACTG AAGTAACACA AAGCTTAAAA ATGG AAAATG A AGAGTTTAA 

751 GAAGAGGTTC AGTGATGCTA CATCCAAAGC CCATCAGCTT GAGGAAGATA 
SOI TTGTGTCAGT A ACACATAAA GCAATTGAAA AAGA AACCGA ATTAGACAGT 
351 TTAAAGGACA AACTCAAGAA GGCACAACAT GAA AGAGA AC AACTTGAATG 
^01 TCAGTTGAAG ACAGAGAAGG ATGAAAAGGA ACTTTATAAG GTACATTTGA 
55 ^51 AGAATACAGA AATAGAAAAT ACCA AGCTTA TGTC AG AGGT CCAGACTTTA 

1001 AAAAATTTAG ATGGGAACAA AGA AAGCGTG ATTACTCATT TCAAAGAAGA 
1D51 GATTGGCAGG CTGCAGTT AT GTTTGGCTGA AA AGGAA A AT CTGCAAAGAA 
1101 CTTTCCTGCT TACAACCTCA AGTAAAGA AG AT ACTTGTTT TTTAAAGG AG 
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1151 
1201 
1E51 
1301 
5 1351 
1M01 

msi 

1501 
1551 

10 IbOl 
lt51 
1701 
1751 
IflOl 

15 1651 

noi 
nsi 

2001 
2051 

20 2101 
2151 
2201 
2251 
2301 

25 2351 
2401 
2M51 
2501 
2551 

30 2b01 
2b51 
2701 
2751 
2601 

35 2651 
2^01 
2T51 
3001 



CAACTTCGTA 
CTTTCTGGCT 
TGGCAGACCT 
TTAGCTGATG 
GGACAAGACT 
TGAAACTCCG 
GAATGCCA A A 
TAATAATAAT 
ATGATGCTTC 
GTAAAGCCAT 
GGGGCAAGTC 
ATAATAAATG 
TATGCTGATG 
AATTGCTGAA 
AAGAACTTA A 
CAGAATTCCC 
TTATGTTCTC 
CTTATGCATC 
GATGAAATAC 
CAATGTTGTC 
TAGAGGACTC 
GATCCTCCAA 
TTCCAGCTTT 
CTCCTAACTA 
AAGGTGTGCC 
GGTGTTTGAA 
TTGACTAGTT 
AAAAAAA AAA 
ATGCTTTCAT 
TTTGGTGTTA 
ATCTCTGTTA 
CTTTATTT AT 
TTATTTCAAT 
TTATTTAGCA 
GACTACTAAA 
AAAAAAA AAA 
AAAAAAAAAA 
AAAAA AAA A A 



AAGC AGAGG A 
AAAGA ACTCA 
GCAT ACTGCA 
CAGTGGCAGA 
GATACACTGG 
TCTTC AGATG 
GGCTCCA AAA 
AATGTCTTCA 
AGTA AACACA 
CACCTTCTGC 
TGTGAA ATGA 
TAAACA ACTC 
AACTTGCAAA 
AATGTAAAAC 
A AGGAGTCTA 
AGAGTCCTCA 
ACATTGTCAA 
TCAGGAAACA 
AAAGGCCACC 
TGCAGCCAGC 
TGAGGATAGC 
GTCAACATTT 
GATGTTCACA 
TGATCAGAGC 
CGATGTGCAG 
AGGCATGTGC 
ACTTTTTATT 
ACCACACCTA 
GCACCCTTTA 
CTAGGATCAG 
ATCTTACCTG 
TCCCT AGTTT 
GTTACTGCAC 
GAATATTCAC 
TATTATGTAA 
AAAAAAAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



AC AGGTTCAG 
GTGATGCTGT 
CGCTTGGAAA 
ACTTAA ACTA 
AACACGAACT 
GCTGCAGACC 
ACAAATAAAC 
CAAAGAAAAC 
GACCCAGCCA 
AGC AGAGGCA 
CCAAAGAAAT 
TTGC AGGATG 
A ATGGAGCTG 
TTGAACTAGC 
GAAAATCCAG 
ATGTTTCAA A 
ATGCACAACC 
AGAGATGGAG 
TGTCAGAGTC 
CTGCTCGAAA 
AAAGAAGATG 
ACGTGGGCAT 
AGAAGTGTCC 
AAATTTGA AG 
CGAGCAGTTC 
AGACCCATTT 
ATGAGTTA AT 
AAATAGACCA 
CTGCACTTTC 
GGTCAGTCTT 
CTTTAAAAAA 
GCAGA ACTGT 
TGAAAA ACGT 
A AGTTTCTGT 
TAAAAAGCAT 
AAAAAAAAAA 
AAAAAAAAAA 
AAA AAA AA 



BLAST Res 



GCAACTCGGC 
CAACGTACGA 
ACGAGAAAGT 
AATGCTATGA 
AAGAAGAGAA 
ATTATAAAGA 
AA ACTTTCAG 
GGGGAATCAG 
CTTCTGCCTC 
GATTTTGACA 
TGCTGACAAA 
AGAAAGCAAA 
AAATGGAAAG 
TGAAGTACAG 
CAGAAAGGAA 
ACATGCTCAG 
AGTTCTGCAA 
CAGATGGTGC 
CCCTCTTGGG 
CTTTAGTCGG 
AGAATGTGCC 
GGGACAGGCT 
CCTCTGTGAG 
AACATGTTGA 
CCTCCTGACT 
TGATCAGAAT 
ATAGTTTAGC 
CTGAGGAGAC 
TGACCAGGAG 
TGGCTTATCA 
AAGTTCTTGT 
CTGAATAAAG 
GTATGTATTA 
TGACCTTGTT 
TTGTCATAAC 
AAAAAAAAAA 
AAAAAAAAAA 



Its 



A AGAAGTTGT 
GACAGAACGA 
G A A A A AGC AG 
A A AAAG ATCA 
GTTGAAGATC 
AAAATTTAAG 
ATCAATCAGC 
CAGAAAGTGA 
TACTGTAGAT 
TAGTAACAAA 
ACAGAAAAGT 
ATGC A AT A A A 
AACAAGTG A A 
GACAATT ATA 
AATGGAAGGT 
AGCAAAATGG 
TATGGTAATC 
TTTTTACCC A 
GACTGGA AGA 
CCTGATGGCT 
TACTGCTCCT 
TTTGCTTTGA 
TTAATGTTTC 
AAGTCACTGG 
ATGACCAGCA 
GTTCTAAATT 
AGTAAAAAAA 
CATAGAGCGG 
CTACTTTGAG 
ATAAATTTTA 
GTGTTCGTAT 
GATAC AAGGA 
GTGTGCTAGA 
GATTGAGCAT 
AAAAAAAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



No BLAST result 

45 



Medline entries 



50 "nSfcliaM: 

De Valck Di Jin DY-. Heyninck K-i Van de Craen Hi Contreras 
Fiers Ui 

Jeang KTi Beyaert R.=i The zinc finger protein A20 interacts with 
a 

55 novel 

anti-apoptotic protein which is cleaved by specific caspases- 
Oncogene 

I'm Jul 22nlfi(2T) :mA2-10 
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Peptide information for frame 2 



ORF from fiT bp to 2*455 bp=i peptide length: ?fi^ 
Category- known protein 
Classification: Cell division 



1 f1TSF(3EVPL(3 
51 GIFKVGUSTA 
1D1 (2FCYVTHKGE 
151 ELKIEKTHKE 
EDI AEdKGLTEVT 
251 ELDSLKDKLK 
301 VC2TLKNLDGN 
351 FLKEtfLRKAE 
M01 VKKCLADAVA 
H51 EKFKEC(2RLl3 
5D1 STVDVKPSPS 
551 KCNKYADELA 
bOl KMEGt2NS<2SP 
b51 AFYPDEK3RP 
7D1 PTAPDPPStfH 
751 ESHUKVCPflC 



TSNFAHVIFd 
RDYYTFLWSP 
IRGASTPFflF 
KEELLKLIAV 
(2SLKMENEEF 
KAflHEREflLE 
KESVITHFKE 
E(2V<2ATRt2EV 
ELKLNAMKKD 
KCINKLSDtfS 
AAEADFDIVT 
KIIELKUKECV 
(3CFKTCSE<2N 
PVRVPSUGLE 
LRGHGTGFCF 
SEdFPPDYDC 



NVAKSYLPNA 
NPEH YVEGST 
RASSPVEELL 
LEKETAC2LRE 
KKRFSDATSK 
CflLKTEOEK 
EIGRLtfLCLA 
VFL AKELSDA 
(3DKTDTLEHE 
ANNNNVFTKK 
KGC2VCEFITKE 
KIAENVKLEL 
GYVLTLSNAC2 
DNVVCSC3PAR 
DSSFDVHKKC 
(2VFERHV(2TH 



HLECHYTLTP 
VNCVL AFCGY 
TMEDEGNSDM 
GVGRNERELN 
AHtfLEEDIVS 
ELYKVHLKNT 
EKENLdRTFL 
VNVRDRTHAD 
LRREVEDLKL 
TGNtfrtKVNDA 
IADKTEKYNK 
AEVGDNYKEL 
PVLtfYGNPYA 
NFSRPDGLED 
PLCELMFPPN 
FD(3NVLNF3> 



YIHPHPKDUV 
YLPNDDGEFY 
LVVTTKAGLL 
HEKERCD<2L<3 
VTHKAIEKET 
EIENTKLMSE 
LTTSSKEDTC 
LHTARLENEK 
RL<3MAADHYK 
SVNTDPATSA 
CKOLLflDEKA 
KRSLENPAER 
SCETRDGADG 
SEDSKEDENV 
YDflSKFEEHV 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for J>KFZphamy2_2bn frame 2 

TREMBL:HS33fl211_l product: "taxi-binding protein TXBPISI"; Homo 
sapiens taxi-binding protein TXBP151 mRNA-i complete cds-T N = Ei 
Score 

= 2TMfl -i P = □ 



>TREMBL:HS33fl211_l product: "taxi-binding protein TXBP151":. Homo 
sapiens 

taxi-binding protein TXBP151 mRNAn complete cds- 
Length = 7M7 

HSPs: 

Score = 2^Mfl (MM2-3 bits)n Expect = D - De+OOi Sum P(B) = D-De+OD 
Identities = 575/bD3 (TS*)-. Positives = 57b/b03 (=15*) 

duery: 1 

IITSFflEVPLflTSNFAHVIFfJNVAKSYLPNAHLECHYTLTPYIHPHPKDUVGIFKVGlilSTA tD 

HTSF(3EVPL(3TSNFAHVIF(2NVAKSYLPNAHLECHYTLTPYIHPHPKDUVGIFKVGliJSTA 
Sbjct: 1 

MTSF(3EVPLflTSNFAH\/IF(2NVAKSYLPNAHLECHYTLTPYIHPHPKDlJVGIFKVGL)STA tO 
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fiuery: bl 

RDYYTFLWSPnPEHYVEGSTVNCVLAFflGYYLPNDDGEFYflFCYVTHKGEIRGASTPFflF 15D 

5 RDYYTFLldSPnPEHYVEGSTVNCVLAFOGYYLPNDDGEFYiSFCYVTHKGEIRGASTPFQF 
Sbjct: bl 

RDYYTFLWSPMPEHYVEGSTVNCVLAF(2GYYLPNDI>GEFY(2FCYVTHKGEIRGASTPFt2F ISO 
fluery: 121 

10 RASSPVEELLTIIEDEGNSDriLVVTTKAGXXXXXXXXXXXXXXXXXXXXXXXXXXTAQLRE ISO 

RASSPVEELLTMEDEGNSDMLVVTTKAG 

TAflLRE 

Sbjct: 121 

RASSPVEELLTI1EDEGNSDML VVTTKAGLLELKIEKTMKE<EELLKLIAVLEKETA(2LRE ISO 

15 

fluery: 161 

OVGRMERELNHEKERCDflLOAEflKGLTEVTflSLKPlENEEFKKRFSDATSKAHflLEEDIVS 2M0 
£JVGRI1ERELNHEKERC]>flL(3AE(3KGLTEVT(3SLKnENEEFKKRFSI>ATSKAH 

+EEDIVS 
20 Sbjct: Ifil 

(3VGRMERELNHEKERCDQLflAE(3KGLTEVTi3SLKt1ENEEFKKRFSDATSKAHHVEEDIVS 2MD 
fluery: 2M1 

• VTHKAIEKETELl>SLKI>KLKKA(3HEREflLECi3LKTEKI>EKELYKVHLKNTEIENTKLriSE 30D 

25 

VTHKAIEKETELDSLKDKLKKAflHEREQLECQLKTEKDEKELYKVHLKNTEIENTKLnSE 
Sbjct: 2M1 

VTHKAIE<ETELDSLKDKLKKAi3HERE(3LECl2LKTEKI>EKELYKVHLKNTEIENTKLMSE 300 
30 fiuery: 301 

VflTLKNLDGNKESVITHFKEEIGRLtJLCLAEKENLflRTFLLTTSSKEDTCFLKEcSLRKAE 3b0 

V(3TLKNLDGNKESVITHFKEEIGRL(3LCLAEKENL(2RTFLLTTSSKEDTCFLKEcJLRKAE 
Sbjct: 301 

35 V(3TLKNL-]>GNKESVITHFKEEIGRL(2LCLAEKENL£2RTFLLTTSSKE]>TCFLKE(2LRKAE 3b0 
(3uery: 3bl 

EOVOATRflEVVFLAKELSDAVNVRDRTriADLHTARLENEKVKKCL ADAVAELKLNAIIKKD M2D 

40 E(3 V(3ATR(3EVVFLAKELSDAVNVRDRTHAPLHTARLENEKVKK(3L ADAVAELKLNAHKKD 
Sbjct: 3bl 

E(3V(3ATRi3EVVFLAKELSDAVNVRDRTnADLHTARLENEKVKK(3L AI>AVAELKLNAMKKD M20 
Query' M21 

45 (3DKTI>TLEHELRREVEI>LKLRLi2riAADHYKEKFKEC(3RLi3Ki3INKLSI>£3SANNNNVFTKK MfiD 

(3DKTDTLEHELRREVEDLKLRL«2riAADHYKEKFKEC(3RL(3K<3 IN<LSB(3SANNNNVFTKK 
Sbjct: 1421 

£3DKTDTLEHELRREVE]>LKLRL(3riAADHYKEKFKEC(3RL(3Kl3INKLSP(3SANNNNVFTKIC MBO 

50 

fluery: MSI 

TGNfiflKVNDASVNTDPATSASTVI) VKPSPSAAE ADFDI VTKGtJVCEMTKEIADKTEKYNK SMD 

TGN(3(3KVNDASVNTDPATSASTVD VKPSPSAAEADFDI VTKGOVCEHTKEIADKTEKYNK 
55 Sbjct: M61 

TGNOflKVNDASVNTDPATSASTVDVKPSPSAAEADFDIVTKGflVCEMTKEIADKTEKYNK 510 
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Query: 5M1 

CK(3LLCI>E>CAKCNKYAI>ELAICf1ELKlJKE(3VKIAENVKLELAEVt3DNYKELKRSLENPAER bDO 

CKr3LLC3DEKAtsCNKYADELAKMELKbIKEt3VKIAENVKLELAEV(3DNYKELKRSLENPAER 
5 Sbjct: 5m 

CKC3LL(3DEKAKCNKYADELAKMELKUKE(3VKIAENVKLELAEVC3DNYK:ELKRSLENPAER bdO 

Query: bOl K11E bD3 
KME 

10 Sbjct: bOl KflE bD3 

Score = S31 (12 1 *-? bits)-, Expect = 0-Oe+DOn Sum P(2) = □ . Oe + DO 
Identities = 1M7/153 (Tb,*)-, Positives = 1^/153 <T?*) 

15 Query: b37 

NPYASc3ETRDGADGAFYPDEIt3RPPVRVPSlilGLEDNVVCS(3PARNFSRPDGLEDSEDSKE L^L 
NP A + + 

DGADGAFYPDEIdRPPVRVPSWGLEDNVVCSgPARNFSRPDGLEDSEDSKE 
Sbjct: 5Tb NP- 

20 AERKMED6ADCAFYPDEK3RPPVRVPSUGLEDNVVCSC3PARNFSRPDGLEDSEDSKE L5H 
Query: bT7 

DENVPTAPDPPS(3HLRGH6TGFCFPSSFDVHKKCPLCELriFPPNYD(3SKFEEHVESHUKV 75L 

25 DENVPTAPDPPS£3HLRGHGTGFCFDSSFDVHKKCPLCELnFPPNYD(3SKFEEHVESHWKV 
Sbjct: b55 

DENVPTAPDPPS(3HLRGHGTGFCFDSSFDVHKKCPLCELI1FPPNYD(3SKFEEHVESHIJJCV 71M 

Query: 757 CPMCSEc3FPPDYD<3t2VFERHV(3THFDtfNVLNFD 7flT 
30 CPnCSE<3FPPDYDdt2VFERHV(2THFI><2NVLNFD 

Sbjct: 715 CPHCSE(3FPPDYDt3(3VFERHV(3THFD(3NVLNFI> 7147 



35 



Score = 10M (15. fc, bits)-, Expect = LSe-OE-, Sum P(E) = fl-fie-DE 
Identities = 60/351 (22/:)-, Positives = 157/351 (MHfc) 



fluery: 177 <3LR EdVGRMERELNH- 

EKERCDflL(2AE(2KGLTE VTflSLKHENEEFKKRFSD ATSKAH 232 

QLR E(2V +E+ KE D + + + + + + ++ENE+ KK + 

+ DA 

40 Sbjct: 355 <2LRKAEE<2V(2ATR(2EVVFLAKELSDAVNVRDRTnADL- 
HTARLENEKVKKdLADA MOfl 

fiuery: 233 (3LEEDIVSVTHKAIEKETE- 
LDSLKDKLKKAd3HEREd3LECt3LKTEKDEKELYKVHLKNTE 2T1 
45 + + A++K+ + D+L+ +L++ E E L+ +L+ D 

YK K + 

Sbjct: LJDT VAELKLNAMKKD13DKTPTLEHELRR EVEDLKLRL(3NAAJ>H 

YKEKFKECfl M57 

50 Query: 2T2 

IENTKLnSEV(2TLKNLDGNKESVITHFKEEIGRLc3LCLAEKENL(3RTFLLTTSS<EDTCF 351 

+ L ++ L + N +V T + + G Q N T 

T + + S D 

Sbjct: M5S RLflKt3INKLSD(3SANNNNVFT KKTGNddKVNDASVN 

55 TDPATSASTVD 5DM 

tfuery: 352 LKE(3LRKAEEt3V(3- 

ATR(3EVVFLAKELSDAVNVRDRTnAI>LHTARLENEKVKK(3LADAVA MID 
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+K AE T+ +V + KE + + D + + L + + K 

+ LA 

Sbjct: 505 

VKPSPSAAEADFDIVTKG(3VCEf1TKEIADICTEKYNKCK^LL(3DEKAKCNKYADELAKf1EL SbM 

5 

(3uery : Mil ELKLNAflKKDflDKTDTLE HELRRE VED-LKLRLflllA AD-- 

HYKEKFKEC<3-RLl3K Mbl 

+ K + K + E EL + R + E + + +++ AD Y ++ + 

R + 

10 Sbjct: 5b5 

KUKE(3VKIAENV<LELAEV(3DNYKELKRSLENPAERKnEl>GADGAFYPDEI(3RPPVRVPS bEM 

Query' MbE dINKLSDdSANNNNVFTKKTG 

NdtfKVNDASVNTDPATSASTVDVKPSPSAAEAD 515 
15 + N + a A N F + + G ++ D +V T P + + 

+ + + 

Sbjct: bES UGLEDNVVCSdPARN 

FSRPDGLEDSEDSKEDENVPTAPDPPStfHLRGHGTGFCFDSS bfll 

20 fluery: 51b FDIVTKG<2VCEM 557 

FD+ K +CE+ 
Sbjct: bfiE FDVHKKCPLCEL b^3 

25 Pedant information for DKFZphamyE_Ebn -i frame E 

Report for DKFZphamyE_Ebl c l . E 



30 



ELENGTHJ ?fl c l 

EMU J T0fi77-M7 



EpIJ 5-3D 

EH0M0L1 TREMBL : HS33fl211_l product: n taxl-bi nd ing protein 

35 TXBP151^=i Homo sapiens taxi-binding protein TXBP151 mRN A . 
complete cds- D-D 

EFUNCATD ^ unclassified proteins ES. cerevisiae. YORElbcIB 

3e-lM 

EFUNCAT3 Ofl-0? vesicular transport (golgi network-, etc.) ES- 
40 cerevisiae. YDL05fiwH Ee-13 

EFUNCAT3 30-03 organization of cytoplasm ES- cerevisiae-* 
YDL05fiwJ Ee-13 

CFUNCATJ OT-ID nuclear biogenesis ES- cerevisiaei YDR35bw3 

Me-13 

45 EFUNCATJ 30. OM organization of cytoskeleton IS - cerevisiae. 

YDR35bw3 Me-13 

IEFUNCAT3 03-EE cell cycle control and mitosis IS- cerevisiae. 
YDR35bu3 Me-13 

EFUNCATH 11- DM dna repair (direct repair, base excision repair 
50 and nucleotide excision repair) ES. cerevisiae. YKRO^uO 7e-lE 
EFUNCAT2 30-10 nuclear organization ES. cerevisiae. YKR0T5w3 
7e-lE 

EFUNCATJ 03. E5 cytokinesis ES- cerevisiae. YHRDS3w MY01 - 
myosin-1 isofornO be-11 
55 EFUNCAT J Ofi • EE cytoskeleton-dependent transport ES- cerevisiae. 
YHRDE3W flYOl - myosin-1 isoformJ be-11 

EFUNCAT3 D3-0M budding, cell polarity and filament formation 
ES- cerevisiae. YHR023w NY01 - myosin-1 isoformll be-11 
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EFUNCAT3 1 genome replication! transcript ion -» recombination and 
repair Ell- jannaschii-. MJ13EB3 3e-0S 

EFUNCAT3 Tfi classification not yet clear-cut ES- cerevisiae-. 

YJR134c3 Me-Ofl 

5 EFUNCAT3 DB-IT recombination and dna repair ES- cerevisiae-. 

YNLE50w3 Ee-07 

EFUNCAT3 03-13 meiosis DCS - cerevisiaei YNLB50w3 Be-D7 
EFUNCAT3 03-01 cell growth ES- cerevisiaei YNL07^c3 Ee-Ob 
EFUNCAT3 03-07 pheromone response-, mating-type determination-. 
10 sex-specific proteins ES- cerevisiae-i *YNL07Tc3 Ee-Ob 

EFUNCAT3 Ofi-TT other intracel lular-transport activities ES- 
cerevisiae-i YNL07Tc3 Be-Db 

EFUNCAT3 0^-13 biogenesis of chromosome structure ES - 

cerevisiaei YLR0obw3 5e-0b 
15 EFUNCAT3 11-01 stress response ES- cerevisiae-. YPRlMlcJ Ee-05 
EFUNCAT3 Ob-ID assembly of protein complexes ES- cerevisiaei 

YPRimcJ Be-D5 

EFUNCAT3 03-BB-01 cell cycle check point proteins ES- 
cerevisiae-. YGL0fibw3 Be-05 
20 EFUNCAT3 30-05 organization of centrosome ES- cerevisiae-. 
YPRimcD Be-05 

EFUNCAT3 OB-lb extracellular transport ES- cerevisiae-. 

Y0R3Bbw3 le-DM 

EFUNCAT3 0^-B5 vacuolar and lysosomal biogenesis ES- 
25 cerevisiae-. Y0R3Bbw3 le-04 

EFUNCAT3 30-lb mitochondrial organization ES- cerevisiae-. 
YALDlluO Ee-OM 

EFUNCAT3 Db-07 protein modification (glycolsylation-. acylation-i 
myristylation-i palmitylationn f arnesylation and processing) 
30 - ES- cerevisiae-. YKLB01c3 Be-04 

EFUNCAT3 e amino acid metabolism and transport EN- genital i urn -» 
I1G04B3 4e-Q4 

EFUNCAT3 30-13 organization of chromosome structure ES - 
cerevisiae-. YDRB55w3 ?e-D4 
35 EFUNCAT3 n secretion and adhesion Ell- jannaschii-, MJOBTU 

□ .001 

EFUNCAT3 05-04 translation (initiation-i elongation and 
termination) ES - cerevisiae-. YAL03Sw3 0-001 
EBL0CKS3 BL003Ebl> Tropomyosins proteins 
40 EBL0CKS3 PR00545E 
EBL0CKS3 PR00041F 

ESC0P3 dBtmab_ 1-105-4.1-1 Tropomyosin Erabbit 

(Oryctolagus cuniculus) 5e-D5 

EEC3 3-b-l-3B Myosin ATPase 5e-lb 

45 EPIRKU3 nucleus Be-35 

EPIRKU3 phosphotransferase 5e-10 

EPIRKU3 duplication Be-OT 

EPIRKU3 citrulline 76-0^ 

EPIRKU3 tandem repeat Ee-13 

50 EPIRKU3 heterodimer Be-Dfi 

EPIRKU3 heart Ee-11 

EPIRKU3 endocytosis 3e-lD 

EPIRKU3 polymorphism le-O 6 * 

EPIRKU3 transmembrane protein be-lB 

55 EPIRKU3 ser ine/threonine-specif ic protein kinase 5e-10 

EPIRKU3 cell wall 76-0=1 

EPIRKU3 zinc finger 3e-10 

EPIRKU3 surface antigen be-06 
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EPIRKUD DNA binding be-lE 

EPIRKUD metal binding 3e-lD 

EPIRKUD muscle contraction Ee-13 

EPIRKliO brain Be-Dfi 

5 EPIRKUD acetylated amino end Me-DT 

EPIRKUD actin binding Se-lb 

EPIRKUD endoplasmic reticulum Me-CH 

EPIRKUD mitosis 3e-15 

EPIRKUD microtubule binding 3e-15 

10 EPIRKUD ATP Se-lb 

EPIRKUD chromosomal protein Ee-DB 

EPIRKUD receptor L4e-10 

EPIRKliO thick filament Ee-13 

EPIRKUD phosphoprotein 5e-lb 

15 EPIRKUD glycoprotein Me-ID 

EPIRKUD skeletal muscle 7e-ll 

EPIRKUD calcium binding 7e-D^ 

EPIRKUD alternative splicing 3e-13 

EPIRKUD DNA condensation Ee-06 

20 EPIRKUD coiled coil 5e-lL 

EPIRKUD P-loop Se-lb 

EPIRKtO heptad repeat 3e-13 

EPIRKUD methylated amino acid Ee-13 

EPIRKUD basement membrane le-D^ 

25 EPIRKUD immunoglobulin receptor Ee-O^ 

EPIRKUD peripheral membrane protein 3e-10 

EPIRKUD cardiac muscle Ee-11 

EPIRKUD extracellular matrix le-OT 

EPIRKUD hydrolase 5e-lb 

30 EPIRKliO microtubule le-11 

EPIRKUD muscle le-D^ 

EPIRKUD membrane protein le-D^ 

EPIRKliO EF hand 7e-0T 

EPIRKliO protein biosynthesis He-O 1 ^ 

35 EPIRKliO cytoskeleton 3e-13 

EPIRKUJ hair 7e-m 

EPIRKliO Golgi apparatus le-11 

EPIRKliO calmodulin binding 3e-lD 

ESUPFAMD myosin heavy chain 5e-lb 

40 ESUPFAMD conserved hypothetical PUS protein Me-10 

ESUPFAfO IgA Fc receptor 76-0^ 

ESUPFAMD centromere protein E 3e-15 

ESUPFAMD unassigned Ser/Thr or Tyr-specific protein kinases Se- 
ID 

45 ESUPFAMD calmodulin repeat homology 76-0^ 

ESUPFAMJ myosin motor domain homology 5e-lb 

ESUPFAMD alpha-act inin actin-binding domain homology 5e-10 

ESUPFAMD hypothetical protein MJO^m Me-Ofl 

ESUPFAMD tropomyosin 

50 ESUPFAMD plectin 5e-lD 

ESUPFAMD trichohyalin 7e-0^ 

ESUPFAMD pleckstrin repeat homology le-06 

ESUPFAMD ribosomal protein SID homology 5e-10 

ESUPFAMD giantin Me-13 

55 ESUPFAMD protein kinase homology Se-ID 

ESUPFAMD protein kinase C zinc-binding repeat homology le-Ofl 

ESUPFAMD kinesin motor domain homology 3e-15 

ESUPFAMD human early endosome antigen 1 3e-10 



-167- 



WO 01/98454 



PCTAB01/02050 



CSUPFArO myosin riY02 ae-0fl 

ESUPF APO unassigned kinesin-related proteins le-lD 

ESUPFAMID MS protein 3e-lD 

ISUPFAfU cytoskeletal keratin ^-07 

EKliO All_Alpha 

CKU3 L0UL.C0I1PLEXITY 3-3Q X 

EKIiO C0ILED_C0IL BS-lfl X 



10 SEC f1TSFfiEVPL«3TSNFAHVIFc3NVAKSYLPNAHLECHYTLTPYIHPHPKl>UVGIFKVGLJSTA 

SEG 

PRD ccceeeeeccccceeeeeccccccccccccceeeeeccccccccccccceeeeeeecccc 
COILS 



15 

SE(3 RDYYTFLWSPNPEHYVEGSTVNCVLAF(2GYYLPNDDGEFY(3FCYVTHKGEIRGASTPF<2F 

SEG 

PRD eeeeeeeecccccccccccccceeeecccccccccccceeeeeeeccccccccccccccc 
COILS 

20 

SE<2 RASSPVEELLTI1EI>EGNSI>I1LVVTTKAGLLELKIEKTnKEKEELLKLIAVLEKETAi3LRE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
25 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

. SEfi (3VGRI1ERELNHEKERCDt3Lt3AE(3KGLTEVT(3SLKI1ENEEFKKRFSDATSKAH(3LEEDIVS 

SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCC CCCCCCCC 

SEG VTHKAIEKETELDSLKDKLKKAl3HERE(3LEC(3LKTEKDEKELYKVHLKNTEIENTKLf1SE 

35 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

40 SEC V(3TLKNLDGNKESVITHFKEEIGRL(2LCLAEKENL(3RTFLLTTSSKEDTCFLKEfiLRKAE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhn 
COILS 

CCCCCCCCCCC 

45 

SE<2 E(3V(3ATRc3EVVFLAKELSDAVNVRDRTI1ADLHTARLENEKVKK(3LADAVAELKLNAnKKD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

50 CCCCCCCCCCCCCCCCCCCCCCC 

SEfl (2DKTDTLEHELRREVEDLKLRL(3MAADHYKEKFKEC(3RLl3K(3INKLSDfiSANNNNVFTKK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhn 
55 COILS 



SEr3 TGNfl(3KVNDASVNTDPATSASTVDVKPSPSAAEADFDIVTKG(3VCEriTKEIADKTEKYNK 
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SEG 

PRD hhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



5 

SE(2 CKt3LL(3DEKAKCNKYAI)ELAKriELKlilKEc3VKIAENVKLELAEV(3DNYKELKRSLENPAER 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

10 CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SE(3 Kf1EGd3NS(3SP(3CFKTCSE(2NGYVLTLSNAt3PVL(3YGNPYAS(3ETRDGADGAFYPDEI(2RP 

SEG 

PRD hhhhhhcccchhhhhhhhhhheeeecccccGeeecccccccccccccccccccccccccc 
15 COILS 



SEC? PVRVPSWGLEDNVVCS<3PARNFSRPDGLEDSEDSKEDENVPTAPDPPS<2HLRGHGTGFCF 

SEG 

20 PRD ccccccccccceeeeccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



SEr3 DSSFDVHKKCPLCELf1FPPNYD(3SKFEEHVESHliJKVCPnCSE(3FPPDYD(3(3VFERHV(3TH 

25 SEG 

PRD ccccccccccccccccccccccchhhhhhhhhhhhccccccccccccchhhhhhhhhhhh 
COILS 



30 SE(3 FDflNVLNFD 

SEG 

PRD hcceeeccc 
COILS 

35 

(No Prosite data available for DKFZphamyE^bn - 2 ) 
(No Pfam data available for DKFZphamyB_5bn . 2 ) 
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5 group: metabolism 

DKFZphamy2_Ec22 encodes a novel 3bM amino acid protein with 
similarity to the 1-acyl-glycer ol-3-phosphate acyl transferase of 
Zea mais. 

10 

It contains one leucine zipper- The protein is belived to play a 
role in fatty acid metabolism. It is ubiqitous expressedi with a 
slight predominance in uterusi placenta and foreskin. 

15 The new protein can find application in modulation of fatty acid 
metabolism and as a new enzyme for biotechnological production 
processes - 



^^CT/lBOJ/02050 



20 weak similarity to l-acyl-glycerol-3-phosphate acyl transfer ase 
(Zea 
mais) 



p e r ha p s co^p- l e t e cds^ 

25 

Sequenced by MediGenomix 

Locus: /map= n fl" 

30 Insert length: 3M03 bp 

Poly A stretch at pos. 3373-1 polyadenylation signal at pos- 3351 



1 

35 51 
1D1 
151 
201 
251 

40 3D1 
351 
HD1 
SI 
SD1 

45 551 
hOl 
b51 
701 
751 

50 8D1 
651 
^01 
151 
1001 
55 1051 
1101 
1151 
12D1 



AGATGCTGCT 
CCCAGCGTCG 
GGTCTGGCGG 
TGGACGACCG 
GAG AATTACA 
TA A AG A AAAT 
TTGTTGCTGA 
TACGTGCTGA 
TGCTCAGCAT 
AAGAGATGCG 
TATCTTGTGA 
AGTCCTTTCA 
TAAA AC ATGT 
TGCATGAAGA 
AGGGA AAGAC 
TTCTCTGCAA 
AAA A AAGATG 
ACGTTTCGAA 
ATCCAGAAAG 
AGTATCAAGA 
AGGCATGCTT 
TATATGGAAC 
AGTAGCTGTC 
CTGCACATGA 
AGCCTTGTTG 



GTCCCTGGTG 
TGCTCCTGGG 
CTGCTCTCCG 
GCTCTACTGC 
CCGGGGTCCA 
ATAATATATT 
CATCTTGGCC 
AAGAAGGGTT 
GGAGGAATCT 
AAACAAGTTG 
TTTTTCCAGA 
GCTAGTCAGG 
GCTAACACCA 
ATTATTTAGA 
GATGGAGGGC 
AGAATGTCCA 
TCCCAGAAGA 
ATCAAAGAT A 
AAGAAAA AGA 
AGACTTTACC 
ATGACCGATG 
CCTACTTGGC 
TCCAGACAGT 
CATCAAATTG 
ATTGAAGATT 



CTCCACACGT 
CACGGCGCCC 
CCTTCCTGCC 
GTCTACCAGA 
GAT ATTGCTA 
TAGCAAATCA 
ATCAGGCAGA 
AAAATGGCTG 
ATGTAA AGCG 
CAGAGCTACG 
AGGTACAAGG 
CATTTGCTGC 
CGAATAAAGG 
TGCAATTTAT 
AGCGAAGAGA 
AAAATTCATA 
ACAAGAACAT 
AGATGCTTAT 
TTTCCTGGGA 
ATCAATGTTG 
CTGGAAGGAA 
TGCCTGTGGG 
GGGATGTGCT 
TTTCCTGAAT 
GGATAATAGA 



ACTCC ATGCG 
ACCTACGTGT 
CGCCCGCTTC 
GCATGGTGCT 
TATGG AGATT 
TCAAAGCACA 
ATGCGCTAGG 
CCATTGTATG 
CAGTGCCAAA 
TGGACGCAGG 
TATAATCCAG 
CCAACGTGGC 
CAACTCACGT 
GATGTTACGG 
GTCACCGACC 
TTCACATTG A 
ATGAG A AGAT 
AGAATTTTAT 
AAAGTGTTAA 
ATCTTA AGTG 
GCTGTATGTG 
TTACTATTAA 
ACATTGTCTA 
TTATTA AGGA 
ATTTGTGACG 



CTACCTGCTG 
TGGCCTGGGG 
TACCAAGCGC 
CTTCTTCTTC 
TGCCAAA AAA 
GTTGACTGGA 
ACATGTGCGC 
GGTGTTACTT 
TTTAACGAGA 
AACTCCAATG 
AGCAAACAAA 
CTTGCAGTAT 
TGCTTTTGAT 
TGGTTTATGA 
ATGACGGAAT 
TCGTATCGAC 
GGCTGCATGA 
GAGTCACCAG 
TTCCAAATTA 
GTTTGACTGC 
AACACCTGGA 
AGCATAGACA 
TTTTTGGCGG 
GTGTAAATA A 
A AAGCTGAT A 
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1551 TGCAATGGTC TTGGGC AAAC ATACCTGGTT GTACAACTTT AGCATCGGGG 
13D1 CTGCTGGAAG GGTAAAAGCT AAATGGAGTT TCTCCTGCTC TGTCC ATTTC 
1351 CTATGA ACTA ATG AC A ACTT GAGA AGGCTG GGAGGATTGT GTATTTTGCA 
14D1 AGTCAGATGG CTGC ATTTTT GAGCATTAAT TTGC AGCGTA TTTC ACTTTT 
5 1451 TCTGTT ATTT TCA ATTTATT ACAACTTGAC AGCTCC A AGC TCTTATTACT 
15D1 AAAGTATTTA GT ATCTTGC A GCTAGTTAAT ATTTCATCTT TTGCTTATTT 
1551 CTACAAGTCA GTGA AATAAA TTGTATTTAG GAAGTGTCAG GATGTTCAAA 
lfc>01 GGAAAGGGTA A A A AGTGTTC ATGGGGAAAA AGCTCTGTTT AGCACATGAT 
lb51 TTTATTGTAT TGCGTTATTA GCTGATTTTA CTCATTTT AT ATTTGCAAAA 

10 1701 TAAATTTCTA AT ATTTATTG AAATTGCTTA ATTTGC AC AC CCTGT ACACA 
1751 CAGAAAATGG TATAAAATAT GAGAACGAAG TTTAAA ATTG TGACTCTGAT 
IflOl TCATTATAGC AGA ACTTTA A ATTTCCCAGC TTTTTGAAGA TTTAAGCTAC 
1A51 GCTATT AGTA CTTCCCTTTG TCTGTGCC AT A AGTGCTTG A A AACGTTA AG 
MOl GTTTTCTGTT TTGTTTTGTT TTTTT AATAT CAAAAG AGTC GGTGTG AACC 

15 1^51 TTGGTTGGAC CCCAAGTTCA CAAGATTTTT A AGGTG ATGA GAGCCTGCAG 
2DD1 ACATTCTGCC TAGATTTACT AGCGTGTGCC TTTTGCCTGC TTCTCTTTGA 
2051 TTTCACAGAA TATTCATTCA GAAGTCGCGT TTCTGTAGTG TG6TGGATTC 
2101 CCACTGGGCT CTGGTCCTTC CCTTGGATCC CGTCAGTGGT GCTGCTCAGC 
2151 GGCTTGCACG CAG ACTTGCT AGGAAGAAAT GCAGAGCC AG CCTGTGCTGC 

20 2201 CCACTTTCAG AGTTGAACTC TTTAAGCCCT TGTGAGTGGG CTTCACCAGC 
2251 TACTGCAGAG GCATTTTGC A TTTGTCTGTG TC A AGAAGTT CACCTTCTCA 
2301 AGCCAGTGAA AT ACAG ACTT AATTTGTCAT GACTGAACGA ATTTGTTTAT 
2351 TTCCCATTAG GTTTAGTGGA GCTACACATT A ATATGTATC GCCTTAGAGC 
2401 A AGAGCTGTG TTCCAGGAAC CAGATCACGA TTTTTAGCCA TGGAACAATA 

25 2M51 TATCCCATGG GAGAAGACCT TTCAGTGTGA ACTGTTCTAT TTTTGTGTTA 
2501 TAATTTAAAC TTCGATTTCC TCATAGTCCT TT A AGTTGAC ATTTCTGCTT 
2551 ACTGCTACTG GATTTTTGCT GCAGAAATAT ATC AGTGGCC CACATTAAAC 
2fc01 ATACCAGTTG GATCATGATA AGCAAAATGA A AGA AATA AT G ATT A AGGGA 
2L51 A A ATTAAGTG ACTGTGTTAC ACTGCTTCTC CCATGCCAGA G A AT A A ACTC 

30 2701 TTTCA AGCAT CATCTTTGA A GAGTCGTGTG GTGTGAATTG GTTTGTGTAC 
2751 ATT AGA ATGT ATGCACACAT CCATGGACAC TCAGGATATA GTTGGCCTAA 
2A01 T A ATCGGGGC ATGGGTAAA A CTT ATGAAAA TTTCCTCATG CTGA ATTGTA 
2fi51 ATTTTCTCTT ACCTGTAAAG TAA AATTTAG ATC A ATTCCA TGTCTTTGTT 
2101 AAGTACAGGG ATTTAATATA TTTTGAATAT AATGGGTATG TTCTAA ATTT 

35 2^51 GAACTTTGAG AGGCAATACT GTTGGA ATTA TGTGGATTCT AACTC ATTTT 
3001 AACAAGGTAG CCTGACCTGC ATAAGATCAC TTG AATGTTA GGTTTC ATAG 
3051 AACTATACTA ATCTTCTCAC AAA AGGTCTA T A A A ATACAG TCGTTGAAA A 
3101 A A ATTTTGTA TCA A AATGTT TGGAAAATTA GAAGCTTCTC CTTAACCTGT 
3151 ATTGATACTG ACTTGAATTA TTTTCTAAAA TTAAGAGCCG TATACCTACC 

40 3201 TGTAAGTCTT TTCACATATC ATTTAAACTT TTGTTTGTAT TATTACTGAT 
3251 TTAC AGCTTA GTTATTA ATT TTTCTTTATA AGAATGCCGT CGATGTGCAT 
3301 GCTTTT ATGT TTTTCAGAA A AGGGTGTGTT TGGATGAAAG TAA A A A AAA A 
3351 AAATAAAATC TTTCACTGTC TCT AAAAAAA A A AG A A A AAA AAAAAAAAAA 
. 3401 AAA 

45 

BLAST Results 



50 No BLAST result 



Medline entries 



55 

No Medline entry 
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Peptide information for frame 3 



10 



35 



55 



ORF from 3 bp to lO^M bp=, peptide length: 3bM 
Category: similarity to known protein 
Classification: Metabolism 
Prosite motifs: LEUCINE_ZIPPER (lDS-lSt) 



1 MLLSLVLHTY SNRYLLPSVV LLGTAPTYVL AUGVURLLSA FLPARF YC2AL 
51 DDRLYCVYflS MVLFFFENYT GVGILLYGDL PKNKENIIYL ANHflST VDUI 
1D1 VADIL AIRflN ALGHVR YVLK EGLKULPLYG CYFAtfHGGIY VKRSAKFNEK 
151 EMRNKL<2SYV D AGTPI1YLVI FPEGTRYNPE l2TKVLSAS<2A FAAGRGLAVL 
15 EDI KHVLTPRIKA THVAFDCMKN YLD AIYDVTV VYEGKDDGGdJ RRESPTMTEF 

251 LCKECPKIHI HIDRIDKKDV PEE(3EHI1RRU LHERFEIKDK MLIEF YESPD 
3D1 PERRKRFPGK SVNSKLSIKK TLPSMLILSG LTAGMLHTDA GRKLYVNTUI 
351 YGTLLGCLUV TIKA 

20 

BLASTP hits 

No BLASTP hits available 

25 

Alert BLASTP hits for DKFZphamy2_2c22 i frame 3 
No Alert BLASTP hits found 
30 Pedant information for DKFZphamy 2_2c22 n frame 3 



Report for DKFZphamy2_2c22 - 3 



ELENGTHJ 3tM 
OIliU H2072.M7 



IEpI3 T-lfl 

EHOflOLJ TREf1BL:CEAF313b_l gene: "F2fiB3.5"=i Caenorhabdi t is 

40 elegans cosmid F26B3- 2e-3b 

EFUNCATJ unclassified proteins ES- cerevisiae-. YDRDlflcD 

7e-13 

EFUNCATU Dl-Ob-Dl lipid-, fatty-acid and sterol biosynthesis 

ES- cerevisiae-i YDLDSEcJ Me-05 
45 EFUNCATJ 30.^ other cellular organization ES. cerevisiae-. 

YDLD52cJ Me-05 

EBL0CKS3 BLD12b3A 

EBLOCKSJ BPDCHfl^A 

EPIRKblJ transmembrane protein 2e-ll 

50 ESUPFAfU probable membrane protein YBR0M2c 2e-ll 
EPROSITEJ LEUCINE_ZIPPER 1 

ILK IO Alpha_Beta 

EKliU L0U_C0riPLEXITY 3-57 V. 



SEfl HLLSLVLHTYSnRYLLPSVVLLGTAPTYVLAWGVliIRLLSAFLPARFY(3ALDDRLYCVY(3S 

SEG 

PRD ccchhhhhhhhhccccccceeecccceeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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SE<2 nVLFFFEN YTG V(3ILLYGDLPKNKENIIYL ANH(3STVDUIVAI)IL AIR(3NALGHVRYVLK 

SEG 

PRD hhhhhhhceeeeeeeeecccccccccGeGGecccchhhhhhhhhhhhhccccchhhhhhh 

5 

SEG EGLKULPLYGCYFA(3HGGIYVKRSAKFNEKEnRNKL(3SYVDAGTPnYLVIFPEGTRYNPE 

SEG 

PRD hhhccccccceGGCCceeeeeeccccccchhhhhhhhhhhcccccGGGeeecccccchhh 

10 SE(2 (3TKVLSASt3AFAA«3RGLAVLKHVLTPRIKATHVAFI>CnKNYLI>AIYI>VTVVYEGKDDGG(2 

SEG 

PRD hhhhhhhhhhhhhhhccccccGGGGCccchhhhhhhhhhhccGGeceeGGeGcccccccc 

SEta RRESPTnTEFLCKECPKIHIHIDRIDKKDVPEE(3EHnRRliILHERFEIKDKf1LIEFYESPD 

15 SEG xxxxxxxxxxxxx 

PRD cccccchhhhhccccceGGGGGCCCCCCCCCCCchhhhhhhhhhhhhhhhhhhhhhhccc 

SEG PERRKRFPGKSVNSICLSIKKTLPSMLILSGLTAGHLriTDAGRKLYVNTUIYGTLLGCLUV 

SEG 

20 PRD cccccccccccchhhhhhhhchhhhhchhhhhhhhhhcccccGGGGGGGGchhhhhhhhh 

SEfl TIKA 

SEG 



25 



PRD hccc 



PrositG for DKFZphamy2_2cE5 • 3 
30 PSODDET 1DS->127 LEUCINE_ZIPPER PDOCDDOE^ 

(No Pfam data availablG for DKFZphamy2_2c22 -3) 
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5 group: signal transduction 

DKFZphamy2_2f Ifl encodes a novel 215 amino acid protein with 
similarity to sodium channel protein betal of Rattus norvegicus- 

10 The sodium channel protein beta 1 of Rattus norvegicus is crucial 
in the assembly-i expression! and functional modulation of the 
heterotr imer ic complex of the rat brain sodium channel- The 
expression of the new protein seems to be restricted to brain-i 
all matching ESTs isolated so fari derive from there- 

15 

The new protein can find application in modulating the sodium 
channel beta It studying the expression profile in 
neurodegenerative diseases and of amygdala 
-specific genes- 

20 



similarity to sodium channel protein betal (Rattus norvegicus) 
Pedant: SIGNAL_PEPTIDE 

25 

Sequenced by HediGenomix 
Locus: unknown 



30 Insert length: MD52 bp 

Poly A stretch at pos- MD35-I no polyadeny lat i on signal found 



1 CAGGGCTGAC 

35 51 ACCTGCCACC 

101 GAAGGGGCTC 
151 GAACGGGAGC 
2D1 CCGAGGGCGG 
251 ACGGGCGGCC 

40 301 CAGTTAGGGC 

351 GAACGCAACC 
M01 CACAGCCTGG 
i*51 ACCCTGGGCG 
501 CGGGGTGGGC 

45 551 GGGCGCGGAG 

bDl TCGCTTAGGG 
t,51 CCCAGGCACC 
701 GGGCAGTTCG 
751 CCTTGACCGA 

50 601 AAGATGCCTG 

flSl CTACTGGGTC 
^□1 CGGAGGCCGT 
T51 AAGAGAGAGG 
1DD1 CGAGGGCGGT 

55 1D51 AGGTGGAGAG 
11Q1 CTGCAGGACG 
1151 CCTCTACACC 
1201 CCTTTGTGAA 



AGCACACACG 
CAGGGTCGGG 
CCCCTCTACA 
TGCCGCTTCC 
GCGTGGACGG 
CCGGCGGCTT 
GGACGAAGCA 
GATCCTGGGG 
CTGCTAGGCC 
CAAACGAGCG 
GGGGAGGCGA 
CGGCTGATCA 
CCCAAAGCCC 
GGTGCTCGGC 
TCCCAAAGGG 
GGGAATCTCT 
CCTTCAATAG 
AGTGTCTGCT 
GCAGGGCAAC 
AGGTGGAGGC 
AAAGATTTCC 
CCCCTTTCAG 
TGTCCATCAC 
TGCAATGTGT 
GACGACGCGG 



GCCTGGGGGC 
GCCCCGCACC 
CCCACCCCCC 
TTCCCGGCCC 
GACCGACGTG 
CGGGAGTGGG 
GGAGCCGCGG 
AGGCGAGAGG 
AGC AGTGCGA 
AGGCAGGGGC 
CTGTCCGTGG 
GCTCCCTCGA 
CCGCCCGGCT 
CCTTCCTTCG 
TTTCCTCGAA 
CTGTGTAGCC 
ATTGTTTCCC 
TCCCTGTGTG 
CCC ATGAAGC 
CACCACGGTG 
TTATTTACGA 
GGGCGCCTGC 
TGTGCTCAAC 
CCCGGG AGTT 
CTGATCCCCC 



CTAGAG AAGG 
ATCCGGGGGC 
AACCTCTGAC 
CGCTGCACCT 
GAACGC ATTC 
GTCACGCCCA 
GGCTGGGAGG 
TGAATC AACC 
CTCCCTTCCG 
GCGAGTGGAA 
TGCTG AGCGC 
ACTGGGGAGG 
CCAAA AGCTC 
GTCAGAAAGT 
AGA ATCTGAG 
TTGGA AGCCG 
CTGGCTTCTC 
TGTGGA AGTG 
TGCGCTGCAT 
GTGGAATGGT 
GTATCGGAAT 
AGTGGAATGG 
GTCACTCTGA 
TGAGTTTG AG 
TAAGAGTC AC 



ATTGCTGATC 
GAGCTCCCGG 
ATCGCCGGCC 
CCCCAGGGAG 
TGTAGCCCAG 
GCTGGAGAAG 
ATTCCAGTCG 
TGGACCCTTC 
AGCTGAGCTT 
GCTGGAGTTC 
CGGCGAGAGC 
TCCAGTGGGG 
CCAGGGCCTC 
CGCCCCCTGG 
AGGGCGCAGT 
CCAGCCCCAG 
TCGTGCTTAT 
CCCTCGGAGA 
CTCCTGCATG 
TCTACAGGCC 
GGCCACCAGG 
CAGCAAGGAC 
ACGACTCTGG 
GCGCATCGGC 
CGAGGAGGCT 
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1251 GGAGAGGACT TCACCTCTGT 
13D1 GGTCTTCCTC ACCTTGTGGC 
1351 AGGTCTCA A A AGCCGAAGAG 
1M01 GCCATCCCAT CTGAGAACA A 
5 1M51 GAACAGGAGC AGTGTGACAT 
1501 ATCCCATGTT CAGC AATGTC 
1551 CATCGCTTCC CTTCATGCAT 
IbOl ATCCACCTGC CTCTGAGCTT 
IbSl TCTACGCACC ATAAGACTCT 
10 17D1 AGACTCAACC TCACCCTCTC 
1751 ACTGG ATTTC TCCCCTGTGC 
IflOl AGCACCTCCC TCTGCCCTTA 
IflSl TGCAAGAGAA TGGAAGTCTT 
1T01 GTTAAAGCAA AAGTGTGTCA 
15 MSI CAGGTTCATG GCCC ACTTTG 
2001 AGGTTCTCCA GTGACAGA A A 
2051 GGAGCTTAGT ACTCCAGAGC 
2101 CAAGAAGACA AGGACAGGAG 
2151 GCTCTTAAAA AGTCATGCA A 

20 22D1 GCGATAATGT ATGTGTGCCC 
2251 ACTGA A ATTC CTGAGTTCTT 
2301 TCCTCATTTT ACAT AC AGG A 
2351 AATAAGAGGC TTAAG AGGAT 
2M01 GCCCAGTTTA CACTTCCTGG 

25 2M51 TTTTTTTTGT TTGAG ATGGA 
2501 GTGGTGTGAT CTCGGCTCAC 
2551 TCTCATGCCT CGGCCTCTCC 
2b01 CGCCTGGCTA AATTTTGTAT 
2b51 GCCAGGCTAG TCTTGA ACTC 

30 2701 TCCCA AAGTG CTGAGATTAC 
2751 TTTGTTTCTG AAAAGACTGA 
2fi01 CACAAGCACG GACTGGGCTG 
2fi51 AAATTGGCCA AAAAAGCAGG 
2^01 TCCTTTATGT AAAGATCTGT 

35 2^51 CATCTGAGAC TGATATTT A A 
3001 CATA AGTAAA TGAGCAGTGT 
3051 AATACTTCGC CTATGAATGC 
3101 TGAGGGAGCA GAGAAACTGG 
3151 TTTTAATTAA GTGACAGGTC 

40 3201 AAGAGAGGGG AAAG ATGCTT 
3251 GATTCAGCGA GAGAGAGGTC 
3301 AAGACCATAT TCCATAGGTT 
3351 CATTCTTCCA TCCCTAGGAA 
3M01 TTTATTTATT ATTATTTTTT 

45 3M51 TAGAGTGCAG TGGTGCAATC 
35D1 TTTAAGCAAT CCTCCCACCT 
3551 GCACCACCAT GCCTGGCT A A 
3t,01 TGCCATGTCG CCCAGGCTGA 
3bSl AGCCTCAGCC TCTCAA AGTG 

50 3701 GGCCCA AAAC CAGACCGTTA 
3751 AAATATTTGC AATA AATTCA 
3fi01 ATCCCTCTGA AATAAGGGAG 
3551 GAAAAATTGG CCTC AGTGTG 
3^01 CCACCAGCTT GCGGTAGTAG 

55 3T51 GGTGGCCCAA TAGCTCGTGT 
M001 TGCCTTTTTC TCTTCTTTTT 
M051 AA 



GGTCTC AGA A ATCATGATGT ACATCCTTCT 
TGCTCATCGA GATGATATAT TGCTACAGAA 
GCAGCCCAAG AA AACGCGTC TGACTACCTT 
GG AG A ACTCT GCGGTACC AG TGGAGGAATA 
GAGGTGGCCT GAACACCTG A GGGACTGGAC 
AATGGCATC A GGAGGGCGCC CCAAGGGCCC 
CCATTGTTCT GTTCATTC AT TCATCCATAC 
TCACCTCTGA CTCCCTAACT CCATCAGACC 
GCCAGAACTG AGAAGCCAAC ATTTCTACAT 
CTAGTTTTCC AAC A AG AC AC TCCAAAGCCA 
TCCAAATGAC TTTGTACAAG TGCTGGAGTT 
ACTGGCTGGA ACTGGTTCAT TCTCCATTAC 
AATAGAAGGA AGCAGGAGTG ATTAGTTCGG 
TGAACTTGGA TTCCCTGAAG TCAGTTTTGT 
CTACAGCATC AG AGTGAAGC ACGCCTGTCT 
GATCCTGAAG CATGGACTAA CATGCTCTCT 
TAGATCCTGA TGGGTCTCTA AGGTTCCCTC 
ACTTGGGAAG GACCAATGGT AATTTAAGTG 
CATGTTTCTG GAC ACGTTCC TGATCCTATT 
TCCCTGTGGG CACACCACCT GGGCATTAGG 
CCTCTCAAAA TTTCTGTGCA CCAGTATTAT 
GGCAACTAAG ACTCATACAG GGCTCAAC7G 
AAACTGGAGC AG AAATAAGC CTTAGGTGCT 
GATGGATGTT TTTGTTTGTT TTGTTTTTTG 
GTCTCACTCT GTC ACCTAGG CTAGAGTGCA 
TGCAACCTCT GCCTCTTGGG TTCAAGCAAT 
AGTAGCTGGG ATTACAGGTG TGCACCACCA 
TTTTAGTACA GACAGGGTTT GACTATGTTG 
CTGACCTCA A ATGACCCACC CACCTCAGCC 
AGGCGTGAGG CACTGCGCCC GGTGGATAAC 
CATTGAACTT GTCTATGGCA ATGCTTCTTT 
AGGTC AACTC TGATAGATTC AGATGACTAG 
GAGAAGAACA TGAGGT AG AC TTAAAGAACT 
GACTCTGAAA TATCCTCCAA AAGGAGAGTG 
ACTAAGAAAA ATGTTTAGTC TGAGATGGAT 
GAGAGGGGAG GG ATGGGTAG GTGCTTTCCA 
ATAATTTTCA GATTTTTTTC CCCTAGATTT 
AAAAA ACTTT AGTC A AT ATC TCGTGTTTCA 
CAAGTGTGAC ATCCTTCAGC ACCCAGGGAC 
TATGGAATGT AAGA AGATGA AGGTGACTGG 
CCTCAGACCT GGGACCTCCC TTTATAGGGA 
TAGGGCTTTA CCTTAA AAGC TCATTTTTTT 
AGTACTTAAA ACCAGACTTT TAAATTTTTA 
TGAGACAGAT TCTCACTCTG TCTCCCAGGC 
TCAGCTCACT GCAGCCTC AA CTGCCCCAGG 
CAGCCCCCAG GTA ACTGGGA CTACAGGCAT 
TTTTTGTATT TTATGTAGAG ACAGGGGTCT 
TCTTGAACTC CTGGGCTC AA GCAATCTGCC 
CTGGGATTAC AGGCCTGAGC AACTGTGCCT 
ACACATTAAA GAGTCTGATT TTGTTGAAGA 
AGACTCTTCT TATTGGTA AT TTTCCACACA 
AGGATATAGA CCTTTTTAAC TTTATAGTTA 
AAATTTTTCC AGTCCCATAG CTCATGGATG 
CAAGATGCTT ACTACCACAC CGTTTTCCTC 
ATCTAAGTTG AACCCGGCAG TATGCATGAT 
AA AAAAACCC AACTCAA A AA AAAAAAA AAA 
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BLAST Results 

5 No BLAST result 

Medline entries 

10 



5S"E S ». Jongh KS, Patton DE, Reber BP, Offord J -» Charbonneau 

H T 

Walsh K-> Goldin AL, Catt ^ a " jnrt ional expre ssion of the beta 1 
UA.=. Primary structure and functional expre* 



15 

subunit 



° f , u nna i Science 1112 Hay fl=,25fe, C5DS&) : 

the rat brain sodium channel- Science jmc y 

20 Belcher^n, Howe JR-=> Cloning of the cDNA encoding the sodium 
SSX^.ubunit from rabbit- Gene IT* Hay 6,170 C2 ) : 2B5-b 

25 Hc^cney^!, Cannon XC, Slaugenhaupt SA, Gusella JF-. The 
30 1 



35 



40 



45 



50 



55 



Peptide information for f ^ am ^_^ 

ORF from 604 bp to 14MB bp=i peptide length: 515 
r^onnrv: similarity to known protein . fir%rt 
Classification: Transmembrane proteins unclassified 

t r>, ,r, m tvii^ VCFPVCVEVP SETEAV6GNP tlKLRCISCNK 
1 MPAFNRLFPL ASLVLIYUVS J^f^XjJJ H t3EVESPF(2G RLfllilNGSKDL 

£ SEi SSS S usssffi saissts 

EDI IPSENKENSA VPVEE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for »KFZphamy2_2f IB , frame 3 

PIR:JCM7BB sodium channel protein betal chain - rabbit, N = 1, 

Score - 

P = fl-3e-Ml 
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PIR:A55?3M sodium channel-! vol tage-ga ted beta-1 chain precursor 
human-. N = 1 -> Score = MBfl-i P = 3-be-M0 

5 

PIR:AMB?37 sodium channel beta 1 subunit - rati N = li Score = 

MB 1 ! -i P = 

E-fle-MO 

10 

>PIR:JCM7flfl sodium channel protein betal chain - rabbit 
Length ~= 21fl 

HSPs: 

15 

Score = M3M (bS-l bits)-i Expect = fl.3e-Ml-, P = fi-3e-Ml 
Identities = 1D0/B1M (Mb*)-, Positives = IS^IM <bD*) 

fluery: ID 

20 LASLVLIYUVSVCFPVCVEVPSETEAVfiGNPnKLRCISCnKREEVEATTVVEUFYRPEGG b^ 

LA + V VS + CVEV SETEAV G K+ CISC +R E AT 

Eld +R + G 
Sbjct: 5 

LAF VVGA ALVSSAWGGCVEVDSETEA VYGMTF<ILCISCKRRSETTAETFTEIiJTFR<2KGT bM 

25 

(3uery: 70 KDFL-IYEYRNGH(3EVESP — F(3GRL(2ldNGS 

KDL(2DVSITVLNVTLNDSGLYTCNVS 123 

+ + F+ I Y N + + E F + GR + UNGS KDLC3D + SI + NVT N 

SG Y C+V 
30 Sbjct: b£ 

EEFVKILRYENEVL(3LEEDERFEGRVVLJNGSRGTKI>L(2DLSIFITNVTYNHSGDY(3CHVY IBM 

(2uery: IBM 

REFEFEAHRPF VKTTRLIPLRVTEEAGEDFTS VVSEinnYIXXXXXXXXXXIEfllYCYRK 163 
35 R FE + - + I L V + + A D S+VSEIMriY + 

EM+YCY+K 
Sbjct: IBS 

RLLSFENYEHNTSVVKKIHLE VVDK ANRI>f1ASI VSEIflflYVLI VVLTIULVAEflVYCYKK IfiM 

40 (2uery: IfiM VSKAEEAA-flEN ASD YLAIPSENKEN-SA VPVEE B15 

+ + A EAA 13ENAS + YLAI SE+KEN + V V E 
Sbjct: 1A5 IAAATEAAAt2ENASEYLAITSESKENCTGV<3VAE Blfi 



45 Pedant information for DKFZphamyB_Sf Ifl •» frame 3 



Report for DKFZphamy2_Bf 15 . 3 

50 

ELENGTHJ BIS 
EIllilID BM70B • MD 

CpO M-b") 

EHOIIOL]) PIR:JCM7fio sodium channel protein betal chain - 

55 rabbit 3e-Ml 

EBLOCKSJ BLDOMOID Prokaryotic sulf ate-binding proteins 
EBLOCKSID BP0057D 
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CSC0P2 dlneu 2.1-1.1-1 flyelin membrane adhesion 

molecule PD Era 2e-M3 

CPIRKU3 Schwann cell 2e-07 

EPIRKLO transmembrane protein le-MO 

5 EPIRKliD myelin 2e-07 

EPIRKliO phosphoprotein 5e-07 

EPIRKliD glycoprotein le-MD 

EPIRKIO structural protein 2e-07 

EPIRKliD muscle le-MO 

10 EPIRKliU membrane protein 5e-D7 

ESUPFAM3 immunoglobulin homology 2e-07 

ESUPFAMD myelin PD protein 2e-D7 

EPFArO 16 (immunoglobulin) superfamily 

EKtO Alljeta 

15 [KM] 3D 

EKIO SIGNAL_PEPTIDE 23 

EKIO LOU_C0riPLEXITY M-bS V, 

20 SE<3 MPAFNRLFPLASLVLIYUVSVCFPVCVEVPSETEAV(3GNPnKLRCISCriKREE\/EATTVV 

SEC • 

Ineu- CEEEECCEEEETTTbCEEECE- 

EEECCCCCCCCCEE 

25 SE<2 EUFYRPEGCKDFLIYEYRNGH(3EVESPF(36RL(3UNGSKDL(33>VSITVLNVTLNDSGLYTC 

SEG 

Ineu- 

EEEEEETTTCCCEEEEEETTEEEETTTTTTTEEECCBGGGCBCCEEECCbTTTTTEEEEE 

30 SE(2 NVSREFEFEAHRPFVKTTRLIPLRVTEEAGEDFTSVVSEinMYILLVFLTLWLLIEniYC 

SEG • xxxxxxxxxx 

Ineu- 

EE 

35 SEG YRKVSKAEEAAflENASDYLAIPSENKENSAVPVEE 

SEG 

Ineu- 



45 



40 (No Prosite data available for DKFZphamy2_2f 13 . 3 ) 

Pfam for DKFZphamy2_2f 16 . 3 



HMM_NAME IG (immunoglobulin) superfamily 



win 

*yrNgqpipssegyU)ytRweqqgRYsisif qLtlisldepeDsGtYWCmV* 
50 YRNG ++ E+ ++ R++++G ++ +++T+ +++ +DSG 

Y+C + V 

Query . 77 YRNGH(3EV-- 

ESPF(3GRLfiWNGSKDL(3I>VSITVLNVTLNDSGLYTCNV 122 

55 
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DKFZphamyE_2fSE 



5 group: nucleic acid management 

DKFZphamyE_Ef EE encodes a novel 47^ amino acid protein with 
similarity to YDL153c of Saccharomyces cerevisia- 

10 The novel protein is ubiquitously expressed. YDL153c is involved 
in transcriptional silencing. 



15 



The new protein can find application in modulation of 
transcriptions e-g- transcriptional silencing- 



putative protein 



probably complete cds- 
20 perhaps differential polyadeny lat ion 

YDL153c is involved in transcriptional silencing 

Sequenced by MediGenomix 

25 Locus: /map="4" 

Insert length: EDIT bp 

Poly A stretch at pos- EDDD-i polyadeny lation signal at pos. nfll 

30 

1 GGAGTCTGCA AACTCCGGTG GT AGGGGAGC GCGCTGCTGT TTAGAGCCAC 
51 GAGTTACCGG AGCGCCTG AT TCCTGCGCCG A AGTC AGTGG TGGCCGAAAG 
101 TCCGGAGTCG CTGT A A AACC TGAGATTGTG AGCCATGGTG GGGAGATCCC 
151 GGCGGCGCGG AGC AGCTA AG TGGGCAGCTG TGCGAGCCAA GGCAGGTCCC 
35 EDI ACGCTCACCG ACG A A A ATGG AGATGATTTA GGATTGCCAC CCTCACCAGG 

E51 GGACACCAGC TACT ACC A AG ATCAGGTAGA TGACTTTCAT GAGGCACGAT 
301 CCCGGGCCGC CTT AGCTA AG GGCTGGAATG AAGTACAGAG TGGAGACGAG 
351 GAGGATGGCG AGGAGGAGGA GGAGGAGGTG CTAGCCCTAG ATATGGACGA 
401 TGAGGACGAC GAAGATGGAG GGAATGCGGG GGAGGAGGAG GAGGAGGAGA 
40 451 ATGCCGATGA TGATGGTGGG AGCTCCGTGC AAAGTGAAGC TGAGGCCTCT 

501 GTGGATCCCA GTTTGTCGTG GGGTCAGAGG AAAAAACTTT ACTATGACAC 
551 GGACTATGGT TCCAAGTCCC GAGGCCGGCA GAGTCAACAG GAGGCAGAGG 
tOl AGGAGGAAAG AGAGGAGGAG GAGGAGGCAC AGATCATTCA GCGGCGCCTA 
b51 GCCCAAGCGC TGCAAGAGGA TGATTTTGGT GTCGCCTGGG TTGAGGCCTT 
45 701 TGCAAAACCA GTGCCTCAGG TAGATGAGGC TGAGACACGG GTCGTGAAGG 

751 ATTTGGCTAA AGTTTC AGTG AAAGAGA AGC TGAAAATGTT GCGAA AGGAA 
fiOl TCACCAGAAC TCTTGGAGCT GATAGAAGAC CTGAAAGTCA AGTTG ACAGA 
fl51 GGTTAAGGAT GAGCTGGAGC CATTGTTAGA GTTGGTGGAA CAAGGGATCA 
^01 TTCCACCCGG AAAAGGAAGC CAATACTTGA GGACCAAGTA CAACCTCTAC 
50 ^51 TTGAATTATT GCTCGAACAT CAGTTTTTAT TTGATCCTGA AAGCTAGGAG 

1001 AGTCCCAGCA CATGGACATC CTGTCATAGA A AGGCTTGTT ACCTACCGAA 
1051 ATTTGATCAA CAAGCTGTCC GTTGTGGATC AGAAGCTGTC CTCAGA AATT 
1101 CGTCATCTGT TGACACTTAA GGATGATGCT GTAAAGAAAG AACTGATTCC 
1151 AAAAGCAAAA TCCACCAAGC CCAAACCAAA GTCTGTTTCA AAGACTTCTG 
55 1E01 CTGCTGCCTG TGCTGTTACA GATCTTTCTG ATGATTCTGA TTTTGATGAA 
1E51 AAAGCAAAAC TGAAGTACTA TAAAGAAATA GAAGACAGGC AAAAGCTAAA 
1301 GAGAAAGAAA GAAGAAAATA GCACTG AAGA ACAGGCTCTT GAAGATCAAA 
1351 ATGCAA AGAG AGCTATTACC TATCAAATTG CTAAAAATAG GGGACTTACT 
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mOl CCTAGGAGAA 
1451 GTTCAG A AG A 
15D1 AAGAAGAGCA 
1551 A A A A AG AGC A 
IhUl GCAGTTTTGG 
lb51 TATATAATAC 
1701 GGGTTATTGG 
1751 AG AAAAGAGA 
IfiDl TATTAAGTAT 
lfi51 CTTAGA ACAT 
nOl ATCTTATTTG 
1^51 CTAACTATAA 
2001 A A A A A A A AAA 



AGAAGATTGA 
GCC A A A ATTA 
ACGTT AT AGT 
TTAAGCTTAA 
ATCAATAAAT 
TTTAAATTTT 
ACAATTTATA 
TGATGTTGAA 
TAGCTTAGGG 
GTGTAACTTT 
TAAGGCAGCC 
TTATTGGGCC 
A A A AAA AAA 



TCGCAATCCC 
GAAGAAGAGG 
GGTG A ATT AT 
ATGAAGTTTT 
TTTT ACTTTT 
AAAAATTCTT 
AGAACT ATGG 
GTTTTCCA AT 
AAATTTCAC A 
TCACATAA AG 
TATAAAATAG 
AGATACTTGT 



AGAGTGAAAC 
CCAGGTTCGT 
CTGGCATTCG 
TGCTT AGCAT 
AACTAA AGTC 
GTCCACAAGG 
GAGCA ATATG 
ATTCTGTTGA 
GTTCATTGTG 
AGA ATGCATC 
TTCTGA AGTA 
TAATAAATGG 



ACAGAGAG A A 
G AAGTTCGT A 
TGCAGGAGTT 
A AGGTTTTTG 
ATTGTATTAA 
A AATTTGTCT 
AAGGTGCTTG 
AGTTTTCCAA 
GAGTGTT AAA 
TTTGACAGTT 
TTTTATTTAC 
GCTTA ATGTC 



BLAST Results 
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No BLAST result 



Medline entries 



25 No Medline entry 



30 
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40 



45 



Peptide information for frame 3 



ORF from 135 bp to 1571 bp; peptide length: 47^ 
Category: similarity to unknown protein 
Classification: Nucleic acid management 



1 NVGRSRRRGA 
51 FHEARSRAAL 
1D1 EEEEEN ADDD 
151 <2(3EAEEEERE 
EDI TRVVKDL AKV 
251 VEtSGIIPPGK 
301 LVTYRNLINK 
351 VSKTSA AACA 
MD1 ALEDtfNAKRA 
MSI VREVRKEEdR 



AKUA A VRAKA 
AKGWNEVdSG 
GGSSVdSEAE 
EEEEA(3II(3R 
SVKEKLKMLR 
GSCYLRTKYN 
LSVVJX2KLSS 
VTDLSDDSDF 
ITYdlAKNRG 
YSGELSGIRA 



GPTLTDENGD 
DEEDGEEEEE 
ASVDPSLSUG 
RL At2ALt3EDD 
KESPELLELI 
LYLNYCSNIS 
EIRHLLTLKD 
DEKAKLKYYK 
LTPRRKKIDR 
GVKKSIKLK 



DLGLPPSPGD 
EVL ALDMDDE 
(3RKKLYYDTD 
FG VAUVEAF A 
EDLKVKLTEV 
FYLILKARRV 
DA VKKELIPK 
EIEDR(2KLKR 
NPRVKHREKF 



TSYYtfDtfVDD 
DDEDGGNAGE 
YGSKSRGRCS 
KPVPGVDEAE 
KDELEPLLEL 
PAHGHPVIER 
AKSTKPKPKS 
KKEENSTEEfl 
RRAKIRRRGtf 



50 



55 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2f 22 n frame 3 

PIR:St7701 hypothetical protein YDL153c - yeast (Saccharomyces 
cerevisiae) -i N = Mi Score * 13M-i P = l.fie-Dfi 
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PIR : TDfibTM hypothetical protein DKFZp5b4 00^2 • 1 - human 

(fragnient)i N = 

li Score = mii P = 5.fle-D7 

TREMBL :SPBC3Bfl_^ gene: "SPBC3BB ■ CH" =• product: "hypothetical 
protein"! 

S-pombe chromosome II cosmid c3Bfi--> N = Si Score = lt»H-i P = b-2e- 
13 

>TREMBL : SPBC3Bfl_^ gene: "SPBC3BS . 0=1" n product: "hypothetical 
protein"! 

S-pombe chromosome II cosmid c3B&- 
Length = 5=17 

HSPs: 

Score = IfcM bits)-, Expect = b.2e-13-> Sum P(2) = b-2e-13 

Identities = 4M/lEb (3M*)-, Positives = bfl/12fc, (53*) 



fluery: 3fc,7 DSDFDEKAKLKYYKEIEDRdKLKRK-KEEN STEEt3ALE- 

D<2NAKRAITY<2 41H 

D + +++ L YY+ ++ + K + +K + + EN S + +E + 

KR IT 
25 Sbjct: 472 

DREVEDfiDDLDYYESLDKKSKMAKKLRKENHDLERDLIRASRHPELIELGEGDKRGITLB £31 

(3uery : 415 I AKNRGLTPRRKKIDRNPRVKHXXXXXXXXXXXXGflVREVRKEEtfR- 
YSGELSGIRAGVK 473 

30 IAKNRGLTPRR K +RNPR+K + + Q Y+GE 

+GI+AG+ 
Sbjct: 532 

I AKNRGL TPRRPKE NRNPRLKKRJ1R YEK A KK KL A SKK A I YKG A PflGGY A GE(3 TGI K A GLV 511 

35 fiuery: 474 KSIKLK 1*7=1 

KSIKL+ 
Sbjct: 5^2 KSIKL<2 5^7 

Score = fiO (12-0 bits)-. Expect = k-2e-13-. Sum P(2) = b-2e-13 
40 Identities = 2^/12*1 (22*)-, Positives = bb/12T (51*) 

Query: 11? DEAETRVVK-DLAKVSVKEKLKMLRKESP--ELLELIE 

DLKVKLTEVKDELEPLLE 24^ 

D ++ + +K D + +++E ++ + + P ELL+++E + ++ L E+ 

45 ++L+P L 

Sbjct: 173 DNSDLKSIK(3I>SSAAAIEELV(3(3ISPDLPRTELLKILEAKHPEF(3LFLDEL- 
NdLKPtfLN 231 

<3uery: 250 LVE(3GIIPPGKGS(3YLRTKYNLYLNYCSNISF YL- 
50 ILKARRVPAHGHPVIERLVTYRNLI 3Dfi 

+++ + S<2 L+ + Y S + + FY +LK HP++ 

LV + 

Sbjct: 232 EIKEKL- 

KTYPSS(3LL(3A(3CTALSTYISFLTFYFALLKDGEEDLKNHPinVDLVRCK(3TU 2^D 



fluery: 30^ NKLSVVDdKLS 31^ 

+ D+ L + 

Sbjct: 2^1 ESYCGLDEVLT 3D1 
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Score = 5^ (fi.T bits)-. Expect = ^ • 2e-ll Sum P(2) = T-2e-ll 
Identities = lfl/5^ ( 3D* ) -» Positives = 35/5T (ST/.) 

5 tfuery: 1U VDE AETRVVKDL AKVSVKEKLKflLRKESPEL 

LELIEDLKVKLTEVKDELE--PLLEL 250 

++E + + DL + E LK + L + PE L+ + LK +L 
E+K++L+ P +L 

Sbjct: IflT IEELVtfdlSPDLPRT 

10 ELLKILEAKHPEF(3LFLDELN<2LKP(2LNEIKEKLKTYPSS(2L 2M5 

Query: 251 VE 252 
+ + 

Sbjct: 2Mb LQ 2M7 

15 

Score = 57 (fl.b bits)-i Expect = 3-Oe-Dl-. Sum P(2) = 2-be-Dl 
Identities = 13/5fi (22V.)-, Positives = 2b/5fl (MM*) 

duery: 3b7 DSDFDEKAKLKYYKEIEDRCKLKRK — 
20 KEENSTEE0ALED0NAKRAITYC2IAKNRGLT M22 

D + + ++ L YY + ++ + K+ +K KE + E + I 

RG + T 

Sbjct: M72 

DREVEDdDDLDYYESLDKKSKMAKKLRKENHDLERDLIRASRHPELIELGEGDKRGIT 52^ 

25 

Score = 42 (b-3 bits)-. Expect = 5-2e-0Ti Sum P(2) = 5-26-0^ 
Identities = 13/51 (25*)-, Positives = 2^/51 (5b*) 

duery: m AETR VVKDL AKVSVKEKLKI1LRKESPE-- 
30 LLELIEDLKVKLTEVKDELEPLLE 24 s ] 

+ET + D+++ + LK + + + + S + EL++ + L + EL 

+ LE 

Sbjct: IbO SETDAIDDIS(3liJADNSDLKSIK(2DSSAAAIEELVf2(2ISPDLP" 
RTELLKILE 210 

35 

Score = 3T (5-T bits)-. Expect = l-le-Dfi-7 Sum P(2) = 1-le-Dfl 
Identities = fl/lfl (MM*)-. Positives = 11/lfl (bl*) 

fluery: M3 YYGDdVDDFHEARSRAAL bD 

40 +Y + <3 + D RSRA L 

Sbjct: MD2 FYAN(3ID(3KAAKRSRAVL MM 



Pedant information for DKFZphamy2_2f 22-i frame 3 
45 

Report for DKFZphamy2_2f 22 - 3 



50 ELENGTHD M7T 

(CniilJ 5M,55fl.QD 
EpI3 5-50 

EHOMOLII TREI1BL : SPBC3Bfi_ T gene: "SPBC3BS - OT" % product: 

"hypothetical protein"; S-pombe chromosome II cosmid c3Bfi. le-10 
55 IEFUNCATJ 0M-05.01-DM transcriptional control ICS. cerevisiae-, 

YDLlSScD le-Dfl 
CBL0CKS3 PRDD52fiD 

CBL0CKS3 BLDD3bOC Ribosomal protein ST proteins 
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EBL0CKS3 BLDO^bM A Syndecans proteins 
EBL0CKS3 PRDDbSMG 
EBLOCKSJ PRDOfiSflH 

EBLOCKSJ BLOOfl2 l 4B Elongation factor 1 beta/beta ' /delta chain 
proteins 

EKliD All_Alpha 

EKIO LOW_C0f1PLEXITY SM - b3 Z 

EKIiO C0ILED_C0IL 7-1D V. 

SE<2 HVGRSRRRGA AKWAAVRAKAGPTLTDENGDDLGLPPSPGDTSYYfJDflVDDFHEARSRA AL 

SEG . ....xxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhccccccccccccccccccccccccccchhhhhhhhhhhhhh 
COILS 



SE(2 AKGldNEVflSGDEEDGEEEEEEVLALDNDDEDDEDGGNAGEEEEEENADDDGGSSVaSEAE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhcccccccccccchhhhhhhhhhhhhhccccccccchhhhhhhhhhccccccchhhh 
20 COILS 

SEfl ASVDPSLSlilG(3RKKLYYDTDYGSKSRGRfiSc3(3EAEEEEREEEEEA(3IIt3RRLA(3AL(3EI>D 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

25 PRD hcccccccccccceeeecccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

SEd FG VAUVEAFAKPVPtfVDEAETRVVKDLAKVSVKEKLKNLRKESPELLELIEDLKVKLTEV 

30 SEG 

PRD chhhhhhhhhhccchhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

35 SE(2 KDELEPLLELVEt2GIIPPGKGS<2YLRTKYNLYLNYCSNISFYLILKARRVPAHGHPVIER 

SEG 

PRD hhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhh 
COILS 

ccccc 



SE<2 LVTYRNLINKLSVVDt3KLSSEIRHLLTLKDDAVKKELIPKAICSTKPKPKSVSKTSAAACA 

SEG xxxxxxxxxxxxxxxxxxxx . - 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhh 
COILS 



SE(2 VTDLSDDSDFDEKAKLKYYKEIEDR(2KLKRKKEENSTEE(3ALEDtiJNAKRAITY(2IAKNRG 

SEG 

PRD hhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
50 COILS 



SE<2 LTPRRKKIDRNPRVKHREKFRRAKIRRRGtfVREVRKEEdRYSGELSGIRAGVKKSIKLK 

SEG xxxxxxxxxxxx • 

55 PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccchhh 
COILS 
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(No Pfam data available for DKFZphamyE_Ef EE .3) 
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5 group: nucleic acid management 

DKFZphamy2_2gl2 encodes a novel 1^1 amino acid protein with 
similarity to NVL-2 of Rattus norvegicus- 

10 The novel protein contains 3 EF-hand calcium-binding domains- The 
related human VILIP Ca-dependend protein specifically binds the 
3 ' -untranslated region of the neurotrophin receptori trkB-i an 
mRNA localized to hippocampal dendrites in an act i vi ty-dependent 
manner- The new protein exhibists elevated expression in brain 

15 and testis- 

The new protein can find application in studying the expression 
profile of brain-specific genes and as a new marker for neuronal 
cells- 

20 

strong similarity to NVL-2 (Rattus norvegicus) 

Comment for P35332: 
25 FUNCTION: HAY BE INVOLVED IN THE CALCIUM-DEPENDENT REGULATION OF 
RH0D0PSIN PHOSPHORYLATION - 

TISSUE SPECIFICITY: NEURON-SPECIFIC IN THE CENTRAL AND PERIPHERAL 
NERVOUS SYSTEM • 

MISCELLANEOUS: PROBABLY BINDS TWO OR THREE CALCIUM IONS (BY 
30 SIMILARITY) 

SIMILARITY: TO OTHER EF-HAND CALCIUM BINDING PROTEINS-, BELONGS TO 
THE 

RECOVERIN SUBFAMILY - 

35 Sequenced by MediGenomix . . 

Locus : /chromosome="l n 

Insert length: M2fl5 bp 
40 Poly A stretch at pos- M25fl-i polyadenylation signal at pos- M2M7 



1 GGCGGCTCCG 
51 GCCTCGGTTC 

45 101 GACCGGCCTC 
151 GAGGTGCTGG 
2D1 GAAGCAGTGG 
551 ACCTGGAGGA 
3D1 GCCTCCAAGT 

50 351 CGGCACCATC 
MD1 GCGGCAGCTT 
M51 GACGGCGACG 
5D1 AATCTACAAG 
551 GGCTCACGCC 

55 , bOl GATAAGGACG 
b51 TGACCCATCC 
7D1 TGGTGAGGGG 
751 TGATGACCTC 



GCGCAGACCT 
CCGCGGCCCG 
AGGCCCCGCC 
AGG ACCTTGT 
TAC AAGGGCT 
GTTTCAGCAG 
TCGCGCAGCA 
GACTTCCGGG 
CGAGCAGAAG 
GGCGAATCAC 
ATGGTGGGCA 
CCAGCAGCGT 
ACCAGATTAC 
ATTGTGTTGC 
CAGGGTCCCT 
TCTGGCTGGC 



TGGAGAGCAC 
CCGAGGCTCG 
ATGGGGAAGA 
TCAGAACACT 
TCCTGAAGGA 
CTCTACATCA 
CGCTTTCCGC 
AGTTCATCTG 
CTC AACTGGG 
GCGCCTGGAG 
CCGTGATCAT 
GTGGACAAGA 
ATTGGAGGAG 
TGCTGCAGTG 
GGCCAGAAGG 
CTCCCAGGAG 



AGCTGCCGGC 
GAGCCATCCA 
CCAACAGCAA 
GAGTTCAGCG 
CTGCCCCAGC 
AGTTCTTCCC 
ACCTTCGACA 
CGCCCTGTCG 
CCTTTGAGAT 
ATGCTGGAGA 
GATGCGCATG 
TCTTCAAGAA 
TTCA AGGAGG 
TGACATGCAG 
GGCATGGCCA 
GAGGGACACT 



CCGCGAGCCA 
GCGACCCGGC 
GCTGGCCCCC 
AGCAGGAGCT 
GGCATCCTCA 
CTACGGCGAC 
AGAACGGCGA 
GTCACCTCCC 
GTACGACCTG 
TCATCGAGGC 
A ACCAGGACG 
GATGGACCAG 
CAGCCAAGAG 
AAGTAGAAGC 
CCTCCCAACC 
CCAGCCCCCC 
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flQl 
651 
1D1 
TSl 

5 1001 
1051 
1101 
1151 
1501 

10 1551 
1301 
1351 
1M01 
1M51 

15 1501 
1551 
IbOl 
lb51 
1701 

20 1751 
IflOl 
1351 

noi 

1T51 

25 2001 
5051 
E101 
5151 
2501 

30 5551 
E301 
5351 
5M01 
2M51 

35 5501 
5551 
EbOl 
2b51 
5701 

40 5751 
5fi01 
5fi51 
2=501 
5^51 

45 3001 
3051 
3101 
3151 
3501 

50 3551 
3301 
3351 
3M01 
3M51 

55 3501 
3551 
3b01 
3bSl 



TCTCTGGCCC 

ATCTTTG AGG 

CTGTCTTCTA 

ATAGGGGAGT 

CTGGGGTTCT 

ATGTGGTCCC 

GAGGCTCCAG 

CCTGGCCCTT 

AGCTAATGAT 

AAGACACATC 

CCAGCCGTCA 

CTGGGAGGAG 

AGGACGA AAT 

TTTAAGA A AA 

AGGAGCTACA 

TACTTGGTTT 

CAAGGGGAAA 

AGATAACACT 

TGCTGCTTCA 

TTTCTGCT AC 

AAGCAAGGTT 

TTCGTTTCTG 

CCACAGA ATC 

GGCACCCAGA 

GCTCTGTTTG 

GTTCAACATC 

AATTTTGGAT 

TTAATCTTGC 

TACCTTTGAC 

ACAGTCTAGT 

TGCTGAGCCC 

GCTCCTCACT 

TCTGGGGGAT 

TGGAGGTAGC 

CACTGGTTTG 

TGACCAGCCA 

AGCAGTTGTT 

GCTGGCTATT 

GATGCTCACC 

TTTTCACTAA 

CGCAACCACA 

CTCATTCTAT 

CTAGACTTGG 

ACATAGTCAG 

GGATTTCTTC 

GTGCAGAGGT 

GCTTCCTGCT 

GCTTGGTTGA 

AGCTGTGCTA 

CTCGAAGCAT 

ACATGCCTAC 

CATACCCATG 

CAGCAGTGTG 

CTGATGGA AG 

GGTGTCCCAG 

TGCCTTG AGA 

CTTGCCAC A A 

CACAAA A ATA 



ACCCAGTCCT 
GACCACCTCA 
GCCCCACCTC 
TGGCTTTTGC 
GGTT AGGAAT 
ACAGGCCTGT 
ATCCCATAAA 
CCAGCCCCAG 
TACTGAGCAC 
TTGTGCCCTC 
GGGTCTCAGC 
CTATTTC ATC 
G AAA AGCATT 
ATGA A ATTT A 
GTCATTTTAT 
ATTAT AAAAT 
A A ACCTGAGA 
TTTT A AGACT 
GCTCTTCCTT 
AT ACTTACTC 
TGCAA AGAGT 
GATTGGGTTT 
AGA AAGAGCT 
CATAATTTAT 
AGAGGCTTTT 
TTCCA AGTGT 
CCATGAGCT A 
TTCTTC ATCA 
GTGCAGTGAC 
AC ACAGGTGC 
GGGGCAGGGG 
CCTGAGTGCC 
GCTGATCAAT 
A AGGCCACTG 
CAACCACTGG 
TATGGTG AGG 
TATCCAGCAA 
AGGTATGTCT 
AGCCTGGCTT 
GTGAGGTTCC 
CAGAATTTCA 
ACTCACATCC 
AAGCTGAGAT 
GAGGTGACAC 
TTTTC AGAGT 
TGACAGCAGG 
AAAA AGCTTC 
AGTCCCCACA 
ACAGCTCAGT 
CCTGC ATTGT 
CCATGAAGGC 
GGTGATTTTT 
AACACACAAT 
GAACA ACAGG 
GGACTGTGTG 
AGAGACACAG 
AGC AC A AGGC 
TACAG ACAAT 



CTGCCCAAGC 
CCCTGC AAAA 
CCACTTGGCC 
CCCAGGAGGT 
TCTCTTG ATC 
CACAGGGCCA 
GGGGGTCTCT 
CCTTTGG AGC 
CTGTTTGGTG 
TGGAAGCTCA 
TAAGCAGAAG 
TTCCAGCTCA 
TGGAAGTTTA 
TGTCATACTT 
TATTTCAGGA 
GATTAAATGA 
AGAAAGGGAG 
AAGTCCTGAG 
TTTATTACCT 
CGGTTGGGTG 
GAA ACTAGTG 
AGTTTCAGAA 
AGAAGAA A AG 
GGACGAAATG 
TCTAACCCCA 
GCTGGTTCTG 
TACAGCTGC A 
GGTCTTTCTC 
AGTTGATTTC 
TGTCAGCCCA 
AATTGCATCT 
CACCTGTCCT 
AGAGCTTGGT 
GGTTGCTATC 
GTTGCTATCC 
CTGGGG AGTT 
TGCCTCAAGG 
TGTGCGGTCA 
AGCTGGGACC 
TTCCCTGCAA 
TGGCTTTCA A 
CATGGAGGTG 
TCAGAGAGGA 
AGGGCTAAGA 
CTCTTCCCTG 
GCAAGTTACA 
TGAGATTGTG 
TTTTCA AGCA 
GCTGTCCTGG 
CTTTACCCAC 
GTGTTTGATT 
GCTCCTCAGG 
GCCAGGCCAG 
TGGCCCAGGA 
CTCAGGAGCA 
GTCTCCCGTC 
TGGC AGAGAT 
CAA A ACATTG 



CCTTCCTCCC 
G AG ACAGGTC 
AG A ACC A ATG 
GAGGTTA AGG 
CTGGGATTAT 
AATTGGGTCT 
TCCCCATCCC 
GTTCATTCAG 
CTAAGGATAT 
TAGGGTTGTG 
GTGCTGGAAG 
GCTCCACACA 
GGAGCCACGT 
ATTTTTTTAG 
GGTTAA AATA 
AT AGAGAAAA 
A A AAGACCAT 
CTGCCACTCT 
TTTTCAATTC 
CTGACTTCAG 
TATATTCCGT 
CTGGACTTGT 
GCTCACCTGG 
CCTAA A AATG 
AATCTT AGAT 
CTTTCCAATG 
TGCTTTGACT 
CTGTACTTGT 
CTCTTGAACT 
GGGTGGG AGC 
GCAGGAAAGA 
GCTTCTCTGC 
CCCAAGCTCT 
CTCTTGATGG 
TTTTGCTATC 
CACATCCTCA 
ATGTTGCATT 
GTCAGC ATCA 
TAAATCTTCT 
ATGCTGAATC 
AGGCTTGCCA 
AGGATTTTCA 
AGCATCCCTT 
CTTGA ACCAA 
TCCATTTCTG 
TTGATATTCA 
GTCTTCCAAA 
CTCAGTGTTC 
GAGTCCTCTG 
CATCATCGTC 
ACTCCAGGCT 
CCCAAT ATTC 
GAACTGGGAC 
CATGCTCCTG 
CTGTGGTAGA 
CCTGCACCAG 
TTATGTATGA 
AT ATATTCAA 



CTCCATCA AG 
CTCCAGTACC 
TCCATTGGGC 
AGTTGGGGGC 
GCTTTATAGG 
GTCC ATTCCT 
TTCTACTCTA 
TCCTTTCTTC 
GGTCATTTAC 
AGGCA AACTT 
GCTGGTTAGT 
AAGCTGCAGA 
GAGTGAAAGT 
TACCCTTTAA 
TACTCTATAT 
TATTAATTTT 
GAAATTTACC 
CAGCAGTTTT 
AACAAGCAAC 
GGACAGGA AA 
ATCTTGGT AG 
TCCTTCACTG 
CCACTGTTTA 
TGCCAGGCAT 
CTGCCAGGTA 
CCTGCTTCCC 
GCCGGAAAAA 
GATCAGA AAT 
GCCGGTGAAA 
AGGAAATGAT 
GATGCAGCAT 
AGGTG A A A AC 
ACTGGGCCCT 
GGATAGCAAC 
CTCTTGCTCA 
GGCAGGAACT 
GCTCCCAGGA 
CAGACACATA 
GGTGAAAAGC 
TAGCCTAATT 
TGTGCCCC AT 
CTTCTTTTCT 
GTGCAAGATC 
GGCTCTAAGA 
TGACTAAGCT 
TCCTTTATAG 
AAAAATAGGA 
TGCCTCTGGC 
ACTCAGAACC 
ACTAAGAGAA 
TCTGGACACA 
TCAGACAGCC 
CACCATCTTG 
CATACTCCTG 
GCACTGGCCC 
CTGAGAGAGA 
CTTGCACAGA 
ACTCTCCTTT 
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3701 
3751 
3601 
3651 
3TD1 
3^51 
M0Q1 
MD51 

moi 
msi 

H201 

l*251 



A AATTCCAAT 
TGCCTTCTCA 
GCATCTTATT 
GCCACTTTAG 
ATCAAAAGTG 
CCTGACAATG 
GAG AAGGA AT 
GAAGCTGGGG 
CACAAAAATG 
GAGGCTTTGA 
ATGTTTTATT 
AAAAAGTCAG 



CTTATTGC A A 
CAT ACTCT AC 
TCTTATCTCT 
TCCTTTCAGC 
AAGA A A A AGT 
GCAGAGTCTC 
GGCGTGGGAT 
TCTCCAGGCA 
TATTGA AGCA 
GCTGACTTTA 
CATCATTCTT 
AAAAACAAAA 



CAACTCTGTG 
CCTC ATTC AT 
AA A A ATTATC 
TGT AGTCAGG 
T AGTTCATA A 
TAGAGGTAGA 
GGGGGAAAGA 
GGGTAGTAAG 
AC A A ATATTT 
GAG ATCACTG 
GA A AAAAGAA 
AA A AAAAAAA 



AATTGCAAGG 
CCTTTTGGGC 
AGCAAAGGCT 
ATTATTTAAC 
GTAAAGGCAC 
AATTTGCCTT 
AAAGAAAGAG 
CTGACACTAA 
CCTGAAGATC 
TGGGGTCAAG 
ATAATTCAAA 
AAAAA 
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TCCCAGA ATC 
T AATTGATG A 
ACTTCAGATG 
TT ACCTGT AT 
TAAATCCTTT 
GCTGCAGAGA 
AAGAAGAGAA 
ATATTTTTTA 
CACCCTGGGT 
AATGTCTT AC 
CCTTGGA ATT 



BLAST Results 



No BLAST result 



Medline entries 



^33t7M70: 

Kajimoto Y-» Shirai Y«« Mukai H-« Kuno Ti Tanaka C - =. Molecular 
cloning of 

two additional members of the neural 

visinin-like Ca (2+)-binding protein gene family- J Neurochem 1^3 
Sep^tKB) :lD^l-b 

^07^121: 

Polymeropoulos M-H-n Ide S-t Soares M-B--» Lennon G . G - Sequence 

characterization and genetic mapping of the human VSNL1 genei a 

homologue of the rat visinin-like peptide RNVP1- Genomics 
2^(1) :2?3-275<m5) - 



Peptide information for frame 1 



ORF from 121 bp to bp=i peptide length: Hi 

Category: strong similarity to known protein 
Classification: Protein management 
Prosite motifs: EF_HAND (73-S5) 
EF_HAND (10T-121) 
EF HAND (15^-171) 



1 MGKTNSKLAP EVLEDLVtfNT EFSEtfELKdU YKGFLKDCPS GILNLEEFflfl 

51 LYIKFFPYGD ASKFAC2HAFR TFDKNGDGTI DFREFICALS VTSRGSFEdK 

101 LNUAFEMYDL DGDGRITRLE MLEIIEAIYK MVGTVIMMRM N(3DGLTP(2<2R 

151 VDKIFKKMD<3 DKDDtflTLEE FKEAAKSDPS IVLLL(3CDM«2 K 



BLASTP hits 
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5 



10 



WO 01/98454 PCT/1B01/02050 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2__2gl2 n frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2gl2 -> frame 1 

Report for DKFZphamy2_2gl2 . 1 



ELENGTH J 531 
15 EI1WJ 2^277-^2 
EpI! 5-2L 

EH0N0LJ PIR:JHDfll5 neural visinin-like Ca2+-binding 

protein-type 2 - rat Ie-1D7 

EFUNCATJ Tfl classification not yet clear-cut ES. cerevisiae-i 

20 YDR373wID 3e-52 

EFUNCATJ D3-01 cell growth ITS - cerevisiae-. YKLl^DwH 3e-lfl 

EFUNCAT3 03-07 pheromone response-i mating-type determination-! 

sex-specific proteins ES- cerevisiaei YKLlTOwJ 3e-lfl 

EFUNCATJ 13-DM homeostasis of other ions ES. cerevisiaei 
25 YKLlTDwD 3e-lfi 

EFUNCATIB 0M.D5.D1.0 1 * transcriptional control ES. cerevisiae-i 

YKLnOwJ 3e-lfl 

EFUNCATJ 3D-D3 organization of cytoplasm ES. cerevisiaei 
YKLnOwJ 3e-lfl 

30 EFUNCATJ 11-D1 stress response ES- cerevisiae-. YGRlDOwJ 7e-0M 

EBL0CKSJ BLDD303B S-100/ICaBP type calcium binding protein 

EBL0CKSJ BL00016 

EBL0CKSJ PR0DH50G 

EBL0CKSJ PRD0M50F 
35 EBL0CKSJ PRD0M5DE 

EBL0CKSJ PRDOM50D 

EBL0CKSJ PRDQ45DC 

EBL0CKSJ PR00M50B 

EBLOCKSJ PRD0M50A 
40 ESCOPJ dlosa 1-37. 1-5. 13 Calmodulin ^Paramecium 

tetraurelia) fle-25 

ESCOPJ dlrec 1-37. 1-5-21 Recoverin Ebovine (Bos 

taurus) le-72 

ESCOPJ dlaMpa_ 1-37-1-2-5 Calcyclin (S100) EHuman (Homo 

45 sapiens)-! PI 7e-D5 

ESCOPJ dlrro 1 • 37 - 1 - M - 1 Oncomodul in Erat (Rattus 

norvegicus) 2e-17 

ESCOPJ dlsyma_ 1-37-1-2-2 Calcyclin (S100) Erat (Rattus 

norvegicus) Te-m 

50 ESCOPJ dMicb 1-37-1-1-1 Calbindin DTK Ebovine (Bos 

taurus) 2e-lfl 

ESCOPJ dlauib_ 1-37.1-5-1^ Calcineurin regulatory subunit 

(B-chain le-M5 

EPIRKUJ blocked amino end le-TT 

55 EPIRKUJ phosphotransferase 3e-DB 

EPIRKUJ duplication 7e-17 

EPIRKUJ tandem repeat 7e-0b 

EPIRKUJ heterodimer 7e-17 
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EPIRKbO heart 7e-Db 

EPIRKUJ serine/threonine-specif ic protein kinase 7e-0b 

EPIRKbO acetylated amino end 7e-0b 

EPIRKIO ATP 7e-Db 

5 EPIRKIO skeletal muscle 7e-0b 

ILPIRKliD signal transduction Me-bT 

EPIRKIO protein kinase 3e-0fi 

EPIRKIO calcium binding le-^T 

EPIRKIO alternative splicing le-13 

10 EPIRKIO lipoprotein le-n 

EPIRKIO cardiac muscle 7e-0b 

EPIRKIO muscle 7e-Db 

EPIRKIO myristylation le-^ 

EPIRKIO EF hand le-^ 

15 EPIRKIO retina le-Mb 

ESUPFAnil calcium-dependent protein kinase 3e-0fl 

CSUPFAriJ unassigned calmodul in-related proteins Ee-3M 

ESUPFANJ protein kinase homology 3e-0fl 

ESUPFANJ calmodulin le-TT 

20 CSUPFAMJ calmodulin repeat homology le-TT 

EPR0SITE3 EF_HAND 3 

EPF AMI EF hand 

EKliO All_Alpha 

EKIO 3D 

25 

SEC GGSGADLGEHSCRPASt3PRFPRPAEARSHPATRRPASGPAf1GKTNSKLAPEVLEDLVt3NT 
Irec- 

HHHHHHHHHTTTT 

30 

SE(3 EFSE(3ELK(3UYK6FLKDCPS6ILNLEEFc3t3LYIKFFPY6DASKFA(3HAFRTFDKN6DGTI 

Irec- CCCHHHHHHHHHHHHHHTTTTEEEHHHHHHHHHHHTTTTCHHHHHHHHHHHH 

--CEE 

35 SEC DFREFICALSVTSRGSFE(3KLNWAFEriYDLl>GI>GRITRLEnLEIIEAIYKnVGTVini1Rn 
Irec- 

EHHHHHHHHHHHHCCCGGGHHHHHHHHHTTTTCCCEEHHHHHHHHHHHHHCCTTTTGGGC 

SE<3 N(3I)GLTP(3£3RVPKIFICKnDf3]>Kl>P(3ITLEEFKEAAKSDPSIVLLL(3C])n(3K 
40 Irec- TTTTTCHHHHHHHHHHHHCCTTTTEECHHHHHHHHHHCHHHHHHHCCCHHH 



45 



PSOOOlfl 
PSDDDlfi 
PSDDDlfi 



Prosite for DKFZphamyE_EglE . 1 



U3->lEb 
!M c l->lbE 
l c J c i->SlE 



EF_HAND 
EF_HAND 
EF_HAND 



PD0CDD01B 
PDOCODDia 
PDOCDOOlfl 



50 



Pfam for DKFZphamyE_EglE - 1 



55 HNH_N AME EF hand 



Hnn 



*EIqEnFrmni>kl>GDGyIDFEEFmennkem* 
Q +FR +DK+GDG+IDF EF+ +++ 
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10M FAC3HAFRTFDKNGDGTIDFREFICALSVT 132 



27-15 1M0 Ibfi 1 2=5 dkf zphamy5_Egl2.1 strong 

similarity to NVL-2 (Rattus norvegicus) 
5 Alignment to Hf1H consensus: 

Query *ElqEnFrmHDkDGDGyIDFEEFmenHkem* 

++++F+N+D DGDG+I+ E++E++ + + 
dkfzphamye 1MD KLNUAFENYDLDGDGRITRLEriLEIIEAI Ibfi 



10 Query 51fl 1 ' 2^ dkf zphamy2_J?gl2 - 1 strong 

similarity to NVL-2 (Rattus norvegicus) 

Alignment to HHM consensus: 
Hnn ~ ^ElqEnFrmMDkDGDGylDFEEFmefirikem* 

++++F++MD+D+D +I+ EEF+E+ K + 
15 Query RVDKIFKKMD(2DKDI>d3ITLEEFKEAAKSD 21B 



-190- 
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DKFZphamyE__Eil? 
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5 group: amygdala derived 

DKFZphamyE_2il7 encodes a novel MbS amino acid protein without 
similarity to known proteins- 

10 Most ESTs are derived from brain and pancreas- 

No informative BLAST results 1 . No predictive prosite-i pfam or SCOP 
motife- 

The new protein can find application in studying the expression 

15 profile of amygdala-specific genes- 
unknown protein 

20 perhaps complete cds - 

Sequenced by NediGenomix 
Locus : unknown 



25 



Insert length: 3473 bp 

Poly A stretch at pos- 3454-, polyadenylation signal at pos- 343b 



30 1 GATATCCCAA TCTTTGGACT GCATCCTGGT TGCCTCTACT GTGGTCACCT 

51 TTGGGA AGA A ATGTCTTCTG TA A A A AGAAG TCTG AAGC A A GAAATAGTTA 
1D1 CTCAGTTTC A CTGTTC AGCT GCTG A AGGAG ATATTGCCA A GTTAACAGGA 
151 ATACTCAGTC ATTCTCCATC TCTTCTCAAT GAAACTTCTG AAAATGGCTG 
EDI GACTGCTTTA ATGTATGCGG CAAGGAATGG GCACCCAGAG ATAGTCCAAT 
35 £51 TTCTGCTTGA GAAAGGGTGT GACAGATCAA TTGTCAATAA ATCAAGGCAG 

3D1 ACTGCACTGG AT ATTGCTGT ATTTTGGGGT TATAAGCATA TAGCTAATTT 
351 ACTAGCTACT GCTAA AGGTG GGAAGAAGCC TTGGTTCCTA ACGA ATGAAG 
401 TGG A AGAATG TGAAAATTAT TTTAGCAAAA CACTACTGGA CCGGAAA AGT 
MSI GAAAAGAGGA ATAATTCTGA CTGGCTGCTA GCTAAAGAAA GCCATCCAGC 
40 501 CACAGTTTTT ATTCTTTTCT CAGATTTAAA TCCCTTGGTT ACTCTAGGTG 

551 GCAATAAAGA AAGTTTCCAA CAGCCAGAAG TTAGGCTTTG TCAGCTGAAC 
bDl TACACAGATA TAAAGGATTA TTTGGCCCAG CCTGAGAAGA TCACCTTGAT 
b51 TTTTCTTGGA GTAGAACTTG AAAT A AAAGA CAAACTACTT AATTATGCTG 
701 GTGAAGTCCC GAGAGAGGAG GAAG ATGGAT TGGTTGCCTG GTTTGCTCTA 
45 751 GGTATAGATC CTATTGCTGC TGAAGA ATTC AAGCAAAGAC ATGAAAATTG 

8D1 TTACTTTCTT CATCCTCCTA TGCCAGCCCT TCTGCAATTG AAAGAAAAAG 
flSl AAGCTGGGGT TGTAGCTCAA GCAAGATCTG TTCTTGCCTG GCACAGTCGA 
^Dl TACA AGTTTT GCCCAACCTG TGGAAATGCA ACTAAAATTG AAGAAGGTGG 
^51 CTATA AGAGA CTATGTTTAA AAGA AGACTG TCCT AGTCTC AATGGCGTCC 
50 10D1 ATAATACCTC ATACCCA AGA GTTGATCCAG TAGTAATCAT GCAAGTTATT 
1051 CATCCAGATG GGACCAA ATG CCTTTTAGGC AGGCAGAAAA GATTTCCCCC 
11D1 AGGCATGTTT ACTTGCCTTG CTGGATTTAT TGAGCCTGGA GAGACAATAG 
1151 AAGATGCTGT TAGGAGAGAA GTAGAAGAGG AAAGTGGAGT CAAAGTTGGC 
1S01 CATGTTCAGT ATGTTGCTTG TCAACCATGG CCAATGCCTT CCTCCTTAAT 
55 1ES1 GATTGGTTGC TTAGCTCTAG CAGTGTCTAC AGAAATTAAA GTTGACAAGA 
13D1 ATGA A ATAGA GGATGCCCGC TGGTTCACTA GAGAACAGGT CCTGGATGTT 
1351 CTGACCAAAG GGAAGCAGCA GGCATTCTTT GTGCCACCAA GCCGAGCTAT 
1401 TGCACATCAA TTAATCAAAC ACTGG ATTAG AATAAATCCT AATCTCTAAA 
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1451 TCTA AG AACT AAGCTTTG AG 

1501 TCCTCA AGTG ATATT AG AG A 

1551 CAAAATACGA TGTTGGGTTT 

IbOl TCACAAATTC ATATTTTT AC 

5 lfc>51 TAAATTTGGG TCAGTCTTCT 

1701 TCCTTCACAG TTTTATCTCA 

1751 GTTGGAA AGA TGTTGTAGAA 

1601 CATTAAACTG ATGATGAAGA 

1651 AATTTTAATG AATTTTTAGC 

10 nDl TTTTTAGTCC TTTTGTT ACC 

1=151 TCTCCCCATT GTAGTCCCAC 

2001 GCAGTGGAGG TGTTACAGTG 

2D51 GAATTTTCTA ATTTAAT A AT 

21D1 GGAAGGAGAA AAGCATACTA 

15 E151 AATTA ATTAA CATATCT ACT 

22D1 AAGACAAACC AAG ATTCTCT 

2251 ATTCTCCTCA AAGGTCGG AT 

2301 TTAATTTAAA ACATTGGCTT 

2351 TGTTAACAAA AAATTGTATA 

20 2H01 ATATATCTTG GCTTTGACCT 

2M51 AAGT A AGTAT ATTTTAGAGC 

25D1 CATAGTAGTT CACAATTTTG 

2551 TTTTGAAATT GATTTAATAG 

2b01 ATAATTTGCA TCTTTACT A A 

25 2b51 AT AATATATG AAATTAGCTT 

2701 TGTGAAAGGT GAATTTACCT 

2751 CAGTCTGTAC TATATAATGC 

2fi01 GAAGCTATTA CTA AAAAT AT 

2651 AGAATAAATT TAATTTACC A 

30 2101 ACATTTATTT AACTTAAATA 

2T51 ACTAATGAGC AGTGATTTTC 

3DD1 AGAATTTAAA CCAAATT A A A 

3051 AATTCGAGCT ACATAAGTAT 

3101 A ATAGAGCAT CTTGAAATTC 

35 3151 GCAGATTCTA AGATTACATT 

3201 TAGGAAAGTA GAATATA A AG 

3251 TGGATATTTA ATCCTTACTT 

3301 ATTTTAATAA GAACTCTTAA 

3351 £TAAATAGAA GGAATGGCCA 

40 3M01 GTGAATGTAT TCTACTGGAA 

3M51 ATCGAAA AAA AA A AAAAAAA 



45 

No BLAST result 



50 



No Hedline entry 

55 

Peptide 
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TATT ATTTA A TAATTTCT A A TAACACTCAT 
TT ATTCAGT A CTCTTGAGAG TGTCAC A AC A 
TCGAAATATT TTCAAAGTGT TCTGTCTTA A 
ACATTTTTAC AATATTGCCT CAGATTATGT 
CTGAACTTTT TCTCTCTCGG TTTCTTTTCT 
CAAAACCATT TTTCTAATAA GAGACATCAT 
ATGTGCAT AA ATTTCAGTGC CTCTTGTAAG 
AAGTTCCTGA TTTGAGAAAT GAATCAAAGT 
TTGTATTAGC TTGAGTT AGC TGGCATTGAT 
TTTAAGTTGT C A AT AT ATGG TTTTTGTTCA 
TTGCTCTTTC CTGGGGGTTC CATTGTTCTA 
TCGCCACTCG TCTA ATTTGA CCAGTGTTAA 
TT A ATAGTGA TCTCAAT ACC ACACCCTCAT 
TT ATATCTGG GACCTCTCTT TTAGACCTAA 
TAT ATGTTAC TTATACCTAA AGCTGTTATT 
GCTTTTGCAC TGAAATTAAA CTTGAAAGGA 
ATTAAATA AG TCCCAGGCAG ATTTACATAT 
TATTTCATTT TGTGATGAGT GATGTATCTG 
ATCATTACCA ATACTATTTA TTATGCTCAA 
TATTTCAACA CATTCTAAGA AGCCTTGACA 
TGAATCAGTA AGATTCTAGA GAAAGCAAAA 
C A ACATAGAA AGTCACATTT TGAAAGGCTA 
CTATTATAGT TTATGAATAT CAAAATTTGT 
TGTATGCT AG AGCTACA AGA GACCTTAAGG 
TCCTTATTTT ATAGATAAGG AAAAAGAAAT 
AATTAGTGAA AGTT ACAT A A CTAATTACAA 
AGAGGACGAT TCTCCCTGTA AAAGGAACTA 
AT AT AGACAA AATTAAA AGA AGGAATGATA 
AAT ATTGTTA ATTA AAATTT TAGATACTTA 
AA AGATAACT GTCAGATAAA ACTTTATTTT 
TTAGGAATTG ATGAAGGCTT ATTGGTATCA 
ACTG AC AG AG GAC ATTT AG A T ACAT A ATA A 
ATGGAAAATA ATGTACCTTG ATTATTATGA 
AGTTTTACTC TAAATGTACT TTTAATACTT 
GTGAAATTCC AGGTTTTCAT AATGTTAAA A 
TATCAACA AG TGTAGTTATA CATTTTGTTT 
GGGAAAAAAT CAGCATCTAG GTAAATTATT 
ATTGCCAACC TCTG AG AGGT GAAAAGCTAT 
GTTCAAAAGA AT AGTAGAAG TGATAGTGCC 
ATGAATGTAA TAATACATTA AATTTTTAAA 
AAA 



BLAST Results 



Medline entries 



information for frame 1 
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ORF from bl bp to IMMb bpn peptide length: MLE 
Category: putative protein 
Classification: unci ass if ied 
5 Prosite motifs: MUTT (355-374) 



1 HSSVKRSLKfl EIVTflFHCSA 

51 MYAARNGHPE IVOFLLEKGC 

10 101 AKGGKKPWFL TNE VEECENY 

151 ILFSDLNPLV TLGGNKESFfl 

EDI VELEIKDKLL NYAGEVPREE 

E51 HPPf1PALL(3L KEKE AGVVA<3 

3D1 LCLKEDCPSL NGVHNTSYPR 

15 351 TCL AGFIEPG ETIEDA VRRE 

MD1 LALAVSTEIK VDKNEIEDAR 

*4S1 LIKHUIRINP NL 



AEGDIAKLTG 
DRSIVNKSR<2 
FSKTLLDRKS 
CJPEVRLCtfLN 
EDGLVAUFAL 
ARSVLAUHSR 

vDPWinovi 

VEEESGVKVG 
UFTREC3VLDV 



ILSHSPSLLN 
TALDI AVFUIG 
EKRNNSDWLL 
YTDIKDYL At2 
GIDPIAAEEF 
YKFCPTCGNA 
HPDGTKCLLG 
HVtfYVACdPU 
LTKGKGOAFF 



ETSENGUTAL 
YKHIANLLAT 
AKESHPATVF 
PEKITLIFLG 
KtfRHENCYFL 
TKIEEGGYKR 
RG3KRFPPGMF 
PMPSSLMIGC 
VPPSRAIAH<3 
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30 



BLASTP hits 



No BLASTP hits available 

25 Alert BLASTP hits for DKFZphamy2_2il7 ■» frame 1 

No Alert BLASTP hits found 

Pedant information for DKFZphamy2_Eil7 t frame 1 



Report for DKFZphamyE_Bil7 - 1 



35 ELENGTHID Mb2 

EMbO 5E07b.E5 
CpIJ 

(CHOnOLU TREMBL : SPBC177B_3 gene: "SPBC177A - 03c" ; product: 

"conserved hypothetical protein"=i S-pombe chromosome II cosmid 
40 , cl77fi. le-M5 

CFUNCAT3) ^ unclassified proteins ITS- cerevisiaei YGL0t7wJ 

CFUNCATJ r general function prediction EH • influenzae-! 

HI0M3E pyrophosphohydrolaseH Me-EM 

45 EFUNCATJ 1 genome replication-i transcription^ recombination and 

repair Ell- jannaschiii MJim^ nucleotide pyrophosphohydrolasel 
le-OM 

EBL0CKS3 BLODEITF Anion exchangers family proteins 

EBL0CKSJ BL01E^3B 

50 EBL0CKS3 DriDlTD^ 

EBL0CKS3 PF00OE3A 

1EBL0CKS J BLOOfiTS mutT domain proteins 

ISC0P3 dlawcb_ 1-^1-3. 1-E GA binding protein (Ga'bP) alpha 

GA bindini Ee-35 

55 ESUPFANI hypothetical protein HI0M3E le-EE 

EPR0SITEJ MUTT 1 

CPFAim Bacterial mutT protein 

EPFAMID Ank repeat 
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(EKliO Irregular 
EKIO 3D 

5 SE(3 f1SSVKRSLK(3EIVT(3FHCSAAEGDIAKLTGILSHSPSLLNETSENGli)TALriYAARNGHPE 
lawcB -CCCTTTTCTTTCCHHHHHHHHTTHHHHHHHHHCCCTT- 
TTEETTTEEHHHHHHHHCCHH 

SE(2 IV(3FLLEK6CDRSIVNKSRl3TALDIAVFUGYKHIANLLATAKGGKKPWFLTNEVEECENY 
10 lawcB 

HHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHHH 

SEC3 FSKTLLDRKSEKRNNSDULLAKESHPATVFILFSDLNPLVTLGGNKESFd3(2PEVRLCc3LN 
lawcB 

15 

SE(3 YTDIKDYLAflPEKITLIFLGVELEIKDKLLNYAGEVPREEEDGLVAUFALGIDPI AAEEF 
lawcB 

20 

SE(2 KCRHENC YFLHPPMP ALL(3LKEKEAGVVAd3ARSVLAlilHSRYKFCPTCGNATKIEEGGYKR 
lawcB 

25 SE<2 LCLKEDCPSLNGVHNTSYPRVDPVViri(3VIHPDGTKCLLGRflKRFPPGnFTCLAGFIEPG 
lawcB 

SE<3 ETIEDAVRREVEEESGVKVGHV(3YVAC(3PUPnPSSLniGCLALAVSTEIKVDKNEIEDAR 
30 lawcB 

SE<3 UFTRE(2VLDVLTKGKt3(3AFFVPPSRAIAH(3LIKHli)IRINPNL 
lawcB 

35 

Prosite for D»CFZphamyE_5il7-l 
40 PSDDBT3 355->375 MUTT PDOCODbTS 



Pfam for DKF2phamyE_2il7 - 1 

45 

HMI1_NANE Ank repeat 

HF1N *GyTPLHIAARyNNvEf1VrlLLt2HGADIN# 
50 G+T+L++AAR+++ E+V++LL++G D 

fluery Mb GbJTALNYAARNGHPEIVdFLLEKGCDRS 73 



55 HMrMslAflE Bacterial mutT protein 

Htin 

#ILMiqRedppnHYdtHhgdWIFPGGkIEeGETPE(3CarREIliJEETGI# 
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L++++++ +++ + 
++G+IE+GET+E+++RRE++EE+G+ 

Query 33? CLLGRflKRF — PPG 

MFTCLAGFIEPGETIEDAVRREVEEESGV 377 
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5 group: intracellular transport and trafficing 

DKFZphamyE_Eol3 encodes a novel 510 amino acid protein with high 
similarity to murine synaptotagmin 3- 

10 The novel protein contains two CE domains. The CE domain is 

thought to be involved in calcium-dependent phospholipid binding 
Synaptotagmins are essential for Ca ( E+ ) -regul at ed exocytosis of 
neurosecretory vesicles- 

15 The new protein can find application in modulating/blocking 
synaptic activity. 
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similarity to synaptotagmin 3 (Mus musculus) 
Sequenced by MediGenomix 
Locus : unknown 



25 Insert length: 2131 bp 

Poly A stretch at pos- E11E-. polyadenylation signal at pos. EAflM 

1 ACTCTATGTC TCCTCTCGTT GGATTGTGAC ACCGGG AGGT CAGGGAACTC 

30 SI CAGGACCTTG TTCTCTGCTG GATTCGCAGC AACCAGCACA GCACGTAGGG 

101 CGTAGTTGGT GCTGGATGGA TGTTTGTTGA ATGA ATG A AT GATGAATGGC 

151 TGGCACCTTG TCTGCTCATC CCTAACTCCT GTTCCTTCAT CTGTGCAGCC 

EDI CT A ATCTTTG TTTCCTCATC TGTCCATCCC TTTATTTGTG CATCCTCATT 

ESI CTTAGCCCCT TCACTGCCCT TCTCCATCTC TTCCTCCTTG TTCATTTGTC 

35 301 CCTGTTCTCT GTCCTCTACT CCACTCATGC CCATCTCTGT CCCCTTGACT 

3S1 TACCCAGTCC CTGCTACT AT CTCCATCCCT A ATTTCTGCC CTCTTGTCTG 

401 TCTACTCCTA ATTCCTTTTC CTTGTCCATC CCTAATACCT GTCACCTTGT 

451 CCTTCTTCCT CGAATCTCC A TCCCTAATCC ATCTGCCCCT A ATCTCTGTC 

S01 CCCTTTGCCC ATCCTTCCTT TTCTCGGTGT CTCTTTCCAC CCTTATCTCC 

40 551 ACACCTGCCC ACCCTGCACT CCC ATTCTGT TTCCCATCTG CACCCTTGCC 

bDl CCATCCCTCC CACACACAGG ACCAGACGGC CACCATGTCA GGAGACTACG 

bSl AGGATGACCT CTGCCGGCGG GCACTCATCC TGGTCTCGGA CCTCTGTGCG 

701 CGGGTCCGAG ATGCTGACAC CA ACGACAGG TGCCAGGAGT TCAATGACCG 

7S1 AATCCGAGGC TATCCCCGGG GTCCAGATGC AGAC ATCTCC GTGAGCCTGC 

45 fiOl TGTCGGTCAT CGTGACATTC TGTGGCATTG TCCTTCTGGG TGTCTCTCTC 

651 TTCGTGTCCT GGAAGTTGTG CTGGGTGCCC TGGCGGGAC A AGGGAGGCTC 

101 GGCAGTGGGC GGTGGCCCCC TGCGCAAAGA CCT AGGCCCT GGTGTCGGGC 

151 TGGCAGGCCT GGTAGGCGGA GGCGGGCACC ACCTGGCGGC TGGCCTGGGT 

1001 GGCCATCCTC TGCTGGGCGG CCCACACCAC CATGCCCATG CCGCCCACCA 

50 1051 TCCACCCTTT GCTGAGCTGC TGGAGCCAGG CAGCCTGGGG GGTTCTGACA 

1101 CCCCTGAGCC CTCCTACTTG GACATGGACT CGTATCCAGA GGCTGCAGCA 

1151 GCAGCAGTGG CCGCTGGGGT CAAACCGAGC CAAACATCCC CTGAGCTGCC 

1E01 CTCTGAGGGG GGAGCAGGCT CTGGGTTGCT CCTGCTGCCC CCCAGTGGTG 

1SS1 GGGGCTTGCC CAGTGCCCAG TCACATCAGC AGGTCACA AG CCTGGCACCC 

55 1301 ACTACCAGGT ACCCAGCCCT GCCCCGACCC CTCACCCAGC AGACTCTGAC 

1351 CTCCCAGCCG GACCCCAGCA GTGAGG AGCG CCCACCTGCC CTGCCCTTAC 

14D1 CCCTGCCTGG AGGCGAGGAA AAAGCC AAAC TCATTGGGC A GATTAAGCCA 

1451 GAGCTGTACC AGGGGACTGG CCCTGGTGGC CGGCGGAGCG GTGGGGGCCC 
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10 



15 



20 



25 



30 



1501 AGGCTCTGGA 
1551 CCCTGCGGTA 
IbDl GCCCTGGACC 
lb51 CAAGATCTAC 
17D1 ACAGGAAGAC 
1751 CCCCTGGCCG 
IfiDl TGACCGCTTC 
1651 TCCTGGAGCT 
MD1 GTGGAGGGCG 
nSl CTGCTACCTC 
2001 CTAACCTCAA 
E051 GCCTCCCTGA 
2101 CAAGA AGAAC 
2151 TGGCCCCCGA 
2201 TACGACTGCA 
2251 CGACGCTGCC 
2301 ATCCCCGCAA 
2351 GTGACCAGCT 
2M01 CGAGTGAGGG 
2M51 CCCCATCCTT 
25D1 GTCCGAGCTG 
2551 GAAGCGAGAG 
2b01 GGTCATTCAG 
2b51 CGAGGCGAGG 
27D1 CTGTCTGTCT 
2751 CTTGTTCTCG 
2fi01 CACACACACA 
2fi51 CCGGTCCCCC 
2^01 GGGTCTCACT 



GAGGCAGGCA 
CCTCT ATGGC 
TCCCTGCCAA 
CTGCTGCCTG 
CCTGAACCCC 
AGCTGGCCCA 
TCGCGGCACG 
GGCCGAGCAG 
GCTCGGAAAA 
CCCACGGCCG 
AGCG ATGGAC 
TCAGCGAGGG 
ACGCTGAACC 
GAGCGTGGAG 
TCGGGCACAA 
GACCCGCACG 
GCCCGTGGAG 
TCAC AAAAGG 
GTCTGGCCTA 
TCCTGCCCGG 
CTGGTGCGGG 
GATGAGAGGA 
CCTCCACTGT 
GGCCATGCAT 
CTGTCTCTTT 
CCGTGA ATGT 
CCTGTGTCCA 
CACTGCTGCT 
CCAA A AA AAA 



CAGGGGC ACC 
TCGGACC AGC 
GGACTCCA AC 
ACCGCA AGAA 
GTCTTC A ATG 
ACGCAA ACTG 
ACCTCATCGG 
CCCCCTGACC 
AGC AGATCTT 
GGCGCCTCAC 
CTC ACTGGCT 
GCGGCGTCTG 
CC ACCTATA A 
AACGTGGGGC 
CGAGGTG ATC 
GCCGCGAGC A 
CACTGGC ATC 
CAGCAA AGG A 
GGCCCGGGAT 
ACCGTG A ATT 
GCAGCCCTGG 
GGCCGGCCCA 
GTCTGTCTTT 
GTCTGGGGGA 
GCTGTTTGTC 
CA ATGGGCC A 
CCCCTTCTGT 
GCTATCAACG 
AAAAAAAAAA 



CTGTGGCCGT 
TGGTGGTG AG 
GGCTTCTC AG 
AAAGTTTC AG 
AGACGTTTC A 
CACTTC AGCG 
CCAGGTGGTG 
GCCCGCTCTG 
GGGGAGCTCA 
CGTGACCATC 
TCTCAGACCC 
AAGA AGCGGA 
TGAGGCGCTG 
TCAGCATCGC 
GGCGTGTGCC 
CTGGGCAGAG 
AGCT AGTGGA 
CTATCAGAGA 
CGGACCAGGC 
CATCTCCTTG 
CCCTAGGCTT 
GCTCCTTCTT 
TCTTCCCTGG 
CCCCTGCCCC 
CAAGACTCAG 
ATCCTCTCTG 
TCGCCACACC 
CC AG A ATAAA 
A 



ATCAGCTTCG 
GATCCTGCAG 
ACCCCTACGT 
ACCAAGGTGC 
ATTCTCGGTG 
TCTATGACTT 
CTGGACAACC 
GAGGGACATC 
ACTTCTCACT 
ATCAAAGCCT 
CTACGTGAAG 
AAACCTCCAT 
GTGTTCG ACG 
CGTGGTGGAC 
GTGTGGGCCC 
ATGCTGGCCA 
GGAAAAG ACT 
AAGAGA ACTC 
TCCCTC AGGA 
AAGCCAT AAC 
CCTAACCCTG 
TCAGGGTGGG 
GGCTCCCCCT 
CCAAAACCCT 
TGTCCCGACC 
TCCTTTCAGA 
CTGCGTCTGG 
CACACTCTGT 



BLAST Results 



35 Entry MMABflT3_l from database TREMBL : 

product: "synaptotagmin 3"t Mus musculus mRNA for synapt otagrni n 
3-. 

complete cds. 

Score = 16m-, P = 5.76-23^-, identities = 3b2/M50-» positives = 
40 3h^/M50-. 
frame +5 



45 



Medline entries 



c lb0bM733: 

Fukuda M-t Kojima Ti Aruga J-i Niinobe M-. Mikoshiba K-t Functional 

50 diversity of C2 domains of synaptotagmin family- 

Mutational analysis of inositol high polyphosphate binding 
domain- J 

Biol Chem l^S Nov 3i B7D ( MM ) : 2b523-7 

55 



Peptide information for frame 2 
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ORF from L3S bp to E404 bp=, peptide length: 5^0 
Category: strong similarity to known protein 
Classification: Cell signaling/ communication 
Prosite motifs: CE_D0MAIN_1 (3E3-33S) 
CE_D0I1AIN_1 (4 55-4 70) 



1 MSGD YEDDLC RRALILVSDL CARVRDADTN DRCtfEFNDRI RGYPRGPDAD 

51 ISVSLLSVIV TFCGI VLLGV SLFVSldKLCW VPURDKGGSA VGGGPLRKDL 

101 GPGVGL AGLV GGGGHHLAAG LGGHPLLGGP HHHAHAAHHP PFAELLEPGS 

151 LGGSDTPEPS YLDNDSYPEA AAA A VA AGVK PSGTSPELPS EGGAGSGLLL 

EDI LPPSGGGLPS AC2SH£3flVTSL APTTRYP ALP RPLTfiflTLTS GPDPSSEERP 

E51 PALPLPLPGG EEKAKLIGfil KPEL Y(3GTGP GGRRSGGGPG SGEAGTGAPC 

3D1 GRISF ALRYL YGSDC2LVVRI L<3 ALDLPAKD SNGFSDPYVK I YLLPDRKKK 

351 F(2TK VHRKTL NPVFNETFflF SVPLAEL ACR KLHFSVYDFD RFSRHDLIGfl 

401 VVLDNLLELA EtfPPDRPLWR DIVEGGSEKA DLGELNFSLC YLPTAGRLT V 

451 TIIKASNLKA NDLTGFSDPY VKASLISEGR RLKKRKTSIK KNTLNPTYNE 

5D1 ALVFD V APES VENVGLSIAV VDYDCIGHNE VIGVCRVGPD AADPHGREHW 

551 AEffL ANPRKP VEHWH(2L VEE KTVTSFTKGS KGLSEKENSE 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamyE_Eol3 i frame E 

TREMBL :N!1ABfl^3_l product: " S y napt ot agmi n 3 n 'i Plus musculus mRNA 
for 

synaptotagmin 3i complete cds.i N = En Score = 1614-t P = l.le-ES^ 

>TREI1BL:nnAB6 c 13_l product: "synaptotagmin 3 n n Mus musculus mRNA 
for 

synaptotagmin 3-i complete cds - 
Length = 5fi7 

HSPs: 

Score = 1614 CE7E-E bits)^ Expect = l.le-E3T-i Sum P(E) = Lle- 
£3=5 

Identities = 3bE/44T (flO*)-, Positives = 3hT/44 c J (flEX) 

tfuery: 14E FAELLEPGSLGGSDTPEPSYLDMDSYPEXXXXXX- 
XXGVKPSflTXXXXXXXXXXXXXXXX 200 

FAELLEPG LGGS+ PEPSYL3>ni>SYPE GVKPSflT 

Sbjct: 143 

FAELLEPGGLGGSELPEPSYLDMDSYPEAAVASVVAAGVKPStSTSPELPSEGGTGSGLLL EQE 
Query: EDI 

XXXXXXXXXXX(2SHtf(3VTSLAPTTRYPALPRPLT(3tfTLTS<3Pl>XXXXXXXXXXXXXXXXX EbO 

(3SHSi3VTSLAPTTRYPALPRPLT(2(2TLT + tf D 

Sbjct: SD3 

LPPSGGGLPSA(2SHt3i3VTSLAPTTRYPALPRPLT<2<3TLTTl2AI>PSTEERPPALPLPLPGG EbE 
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tfuery: Bbl 

XXKAKLIGfllKPELYtfXXXXXXXXXXXXXXXXXXXXXXPCGRISFALRYLYGSDGLVVRI 3ED 

KAKLIGCIKPELYG 
PCGRISFALRYLYGSDtfLVVRI 

5 Sbjct: 2b3 EEKAKLIG<3IKPEL YGGTGPGGRRGGGSGEAGA 

PCGRISFALRYLYGSD<2LVVRI 31? 

Query' 321 

L(3ALDLPAKDSNGFSDPYVKIYLLPDRKKKF(3TKVHRKTLNPVFNETF£3FSVPLAELA(3R 3fiO 

10 

L(3ALDLPAKDSNGFSDPYVKIYLLPDRKKKFt3TKVHRKTLNP + FNETF(3FSVPL AELA(3R 
Sbjct: 31fi 

L(3ALDLPAKDSNGFSDPYVKIYLLPDRKKKF(3TKVHRKTLNPIFNETF(3FSVPLAELA(3R 37? 
15 Query- 3fll 

KLHFSVYDFDRFSRHDLIGtJVVLDNLLELAEdPPDRPLWRDIVEGGSEKADLGELNFSLC 4 40 

KLHFSVYDFDRFSRHDLIGflVVLDNLLELAEtfPPDRPLURDI+EGGSEKADLGELNFSLC 
Sbjct: 37fi 

20 KLHFSVYDFDRFSRHDLIGl2VVLDNLLELAE(3PPDRPLURDILEGGSEKADLGELNFSLC 437 
Query: 441 

YLPTAGRLTVTIIKASNLKAHDLTGFSDPYVKASLISEGRRLKKRKTSIKKNTLNPTYNE SDD 

25 YLPTAGRLTVTIIKASNLKAMDLTGFSDPYVKASLISEGRRLKKRKTSIKKNTLNPTYNE 
Sbjct: 43fl 

YLPTAGRLTVTIIKASNLKAriDLTGFSDPYVKASLISEGRRLKKRKTSIKKNTLNPTYNE 4^7 
Query: 5D1 

30 ALVFDVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGPDAADPHGREHbJAEJILANPRKP 5b0 

ALVFDVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGP+AADPHGREHWAEF1LANPRKP 
Sbjct: 4^fl 

ALVFDVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGPEAADPHGREHUAEflLANPRKP 557 

35 

Query: 5bl VEHU)H(2LVEEKTVTSFTKGSKGLSEKENSE 
VEHWH(3LVEEKT+ + SFTKG KGLSEKENSE 
Sbjct: 55B VEHWHdLVEEKTLSSFTKGGKGLSEKENSE 5fi7 

40 Score = SBD (7fl-D bits)-. Expect = l.le-E3T-. Sum P(E) = Lle-ES 11 ! 
Identities = Ifi/IDO (^fl*)-, Positives = ^/100 <n*) 

<3uery: 1 MSGDYEDDLCRR ALILVSDLC ARVRDADTNDRCdEFND- 

RIRGYPRGPDADISVSLLSVI 5^ 
45 MSGDYEDDLCRRALILVSDLCARVRDADTNDRCt2EFN + 

RIRGYPRGPDADISVSLLSVI 
Sbjct: 1 

MSGDYEDDLCRR ALILVSDLCARVRDADTNDRCflEFNELRIRGYPRGPDADISVSLLS VI bO 

50 Query: bO VTFCGIVLLGVSLFVSlilKLCUVPWRDKGGSAVGGGPLRKD ^ 

VTFCGIVLLGVSLFVSUKLCLIVPURDKGGSAVGGGPLRO 
Sbjct: bl VTFCGIVLLGVSLFVSWKLCWVPbJRDKGGSAVGGGPLRKD 1UU 



55 Pedant information for DKFZphamyE_Eol3 -• frame E 
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ELENGTHJ 5^0 
EMUJ b33Em.D2 
EpIJ b-lb 

EH0M0LJ TREMBL : MM ABfl^3_l product: "sy napto t agmi n 3"=, Mus 

musculus mRNA for synaptotagmin 3-i complete cds- □•□ 
EFUNCATJ ^ unclassified proteins IS- cerevisiae-. YML072cJ 

be-lD 
EFUNCATJ 
• "ES- 
EFUNCATJ 
7e-0b 
EBL0CKSJ 
proteins 
EBL0CKSJ 
EBL0CKSJ 

ESC0PJ dla25a_ 2 . b - 1 . 

(beta) ERa 2e-27 

ESC0PJ dlrsy Synaptogamin It first C2 domain 

ERat (Rattu Me-H3 

2 A domain from cytosolic 



Dl-Ob-Dl lipid-i fatty-acid and sterol biosynthesis 
cerevisiaei YGR170wJ 7e-Db 
3D-D6 organization of golgi ES- cerevisiaei YGR170wJ 

BLD122MA N-acetyl-gamma-glutamyl-phosphate reductase 

BL01D13B Oxysterol-binding protein family proteins 
PF013bfiB 

►2*2 C2 domain from protein kinase c 



ESCOPJ 

phosphol ipase A2 
ESCOPJ 

phospholipase 
EPIRKtO 
EPIRKUJ 
EPIRKUJ 
EPIRKUJ 
EPIRKUJ 
EPIRKUJ 



dlr 1 w 2 - b • 1 - 1 . 

EHuma Se-12 



1 Phosphoinositide-specific 



dlqasb2 
C Me-27 
phosphotransf erase 7e-15 
duplication be-7b 
synaptic vesicle le-lb7 
phorbol ester binding 2e-lM 
zinc 2e-lM 

transmembrane protein □•□ 
EPIRKUJ serine/threonine-specif ic protein kinase 7e-15 

EPIRKUJ membrane trafficking D-D 

EPIRKUJ phospholipid binding be-7b 

EPIRKUJ autophosphorylation 7e-15 

EPIRKUJ ATP 7e-15 

EPIRKUJ phosphoprotein 7e-l£ 

EPIRKUJ glycoprotein le-lb? 

EPIRKUJ calcium binding 5e-3M 

EPIRKUJ alternative splicing le-10 

EPIRKUJ dimer le-75 

EPIRKUJ membrane protein le-lb? 

EPIRKUJ calmodulin binding 2e-7M 

ESUPFAMJ ras-specific GAP catalytic domain homology le-OB 

ESUPFAMJ protein kinase C zinc-binding repeat homology 7e-15 

ESUPFAMJ protein kinase homology 7e-15 

ESUPFAMJ protein kinase C alpha 7e-15 

ESUPFAMJ HsC2 phosphat i dy 1 i nos i to 1 3-kinase le-DT 

ESUPFAMJ synaptotagmin D-D 

ESUPFAMJ PX domain homology le-OT 

ESUPFAMJ pleckstrin repeat homology le-Dfl 

ESUPFAMJ protein kinase C C2 region homology 0-0 

EPR0SITE3 C2_J>0MAIN_1 2 

EPFAMJ C2 domain 

EKUJ Irregular 

EKUJ 3D 

EKUJ L0U_C0MPLEXITY 2D • DO 
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SE<2 nSGDYEDDLCRRALILVSDLCARVRDADTNDRCflEFNDRIRGYPRGPDADISVSLLSVIV 

SEG 

Irsy- 

5 

SEtf TFCGIVLLGVSLF VSbJICLClJ VPURDKGGSA VGGGPLRKDLGPGVGLAGL VGGGGHHLAAG 

SEG xxxxxxxxxxxxxxxxxxxxx 

Irsy- 

10 

SEC LGGHPLLGGPHHHAHAAHHPPFAELLEPGSLGGSDTPEPSYLDMDSYPEAAAAAVAAGVK 

SEG xxxxxxxxxxxxxxxxxxxxx xxxxxxxx- . . 

Irsy- 

15 

SEfl PSC3TSPELPSEGG AGSGLLLLPPSGGGLPSAt3SH(3(3VTSLAPTTRYPALPRPLTQ(3TLTS 

SEG . . . - xxxxxxxxxxxxxxxxxxxxxxxxxxx 

Irsy- 

20 

SEO flPDPSSEERPPALPLPLPGGEEKAKLIGCIKPELYCGTGPGGRRSGGGPGSGEAGTGAPC 

SEG - . . xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx - . 

Irsy- 

25 

SEtf GRISFALRYLYGSD(3LVVRILt3ALDLPAKDSNGFSDPYVKIYLLPl>RKKKF(3TKVHRKTL 

SEG 

Irsy- 

30 CEEEEEEEEETTTTEEEEEEEEEECCCCCBTTTBBCEEEEEEEETTTTTTEECCCTTTBT 

SE(3 NPVFNETFd FSVPLAEL At3RKLHFSVYDFDRFSRHDLIG(3 VVLDNLLEL AEOPPDRPLUR 

SEG 

Irsy- 

35 TTEEEEEEEECCCHHHHHCCEEEEEEEECTTTTCCEEEEE 

SE<2 DIVEGGSEKADLGELNFSLCYLPTAGRLTVTIIKASNLKAnDLTGFSPPYVKASLISEGR 

SEG 

Irsy- 

40 

SE<3 RLKKRKTSIKKNTLNPTYNEALVFPVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGPD 

SEG 

Irsy- 

45 

SE<2 AADPHGREHUAEHLANPRKPVEHlJHflLVEEKTVTSFTKGSKGLSEKENSE 

SEG 

lrsy- 

50 



Prosite for DKFZphamy2_Eol3 • 2 

55 PSDOM^T 323->33^ C 2_D 0 II A I N_l PD0CDD3fi0 

PSDOM^ 1455->M71 C 2_D 0 M A I N_l PD0CDD3fl0 
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Pfam for DKFZphamy2_5ol3 - 2 



5 HHn__NAHE C2 domain 

win 

*LtVrIIeARNLUknDnn6f SDPYVKVdndPdpkDtkKLJKTkTibJNWGLN 

L+VRI++A +L+++D+NGFSDPYVK++++PD+K 

10 KK++TK+++++ LN 



duery 31b L VVRIL(3ALDLPAKDSNGFSDPYVKIYLLPDRK — 

KKFC2TKVHRKT-LN 3bl 

HHI1 PVWNEEeFvFedlPyPdlqrkNLRFaVUDUDRFSRBDFIGHCi* 

15 PV+N E + F + F +P+ +L+ + L+F+V+D+DRFSR+D+IG+++ 

duery 3bB PVFN-ETFi2FS-VPLAELA(3RKLHFSVYDFDRFSRHDLIG(2VV 
MOE 



20 *LtVrIIeARNLUkf1DrinGf SDPYVKVdMdPdpkDtkKUKTkTiUNNGLN 

LTV+II+A NL++HD +GFSDPYVK +++ + 

+++KK+KT++++N+ LN 
fluery MMfi 

LTVTIIKASNLKANDLTGFSDPYVKASLISEGRRLKKRKTSIKKNT-LN 

25 

HUN PVUNEEeFvFedlPyPdlqrknLRFaVUDWDRFSRBDFIGHCi* 

P++N E +VF+ ++ ++ +++ L + AV D+D+++++++IG+C+ 
t3uery HTL PTYN-EALVFD- VAPES VENVGLSIA VVDYDCIGHNE VIGVCR 

53b 

30 
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25 
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DKFZphamy2_? j5 



group: differentiation/development 

5 

DKFZphamy2_7 jS encodes a novel LT3 amino acid protein with 
similarity to Tspyll testis-specif ic Y-encoded-1 ike protein of 
Hus musculus. 

10 TSPY genes are arranged in clusters on the Y chromosome of many 
mammalian species. TSPY is believed to function in early 
spermatogenesis and is a candidate for GBY-i the putative 
gonadoblastoma-inducing gene on the Y- The TSPY family forms part 
of a superf amily t TTSNt with autosomal representatives-, highly 

15 conserved in mammals and beyond. 



The new protein can find application in studying the expression 
profile of testis- and brain-specific genes and diagnosis/therapy 
of malfunctioning male fertility- 

HRIHFB221b 

similarity to Y-linked Gene of Plus musculus 

Sequenced by BHFZ 

Locus: unknown 

30 Insert length: 2fln bp 

Poly A stretch at pos. EfiDD -» polyadenylation signal at pos - 577T 



1 AGG AGAGCTG GTTGCGTGAG 
35 51 _GCGG_CAGCGA CGCGGCTAAA 

101 TGTACGAACG CGGTCGCCAT 
151 GACCCGCCGC CTGAGCAGCT 
201 CGCCGCCGCC GCCGCCGCTC 
251 CGCCCGAGGC TCCAGGAGGA 
40 301 GAGGGGGGTG GGACTGGGCC 

351 TTCTCGAGGA GGGGGGGATC 
401 CCCGGCTGGG ATTCTACCAT 
M51 CACGGAGAGC CTGGAAGCAC 
501 TGGAAATCGA TTTTCAGGTT 
45 551 GCCCTAGAAA CCTGTAGCGC 

bOl CCCGAAGAGC AAGGAAGAGG 
b51 ATGAGCGGGA GAGTATGAGG 
701 AGGAAGCAGA GGAAGGTGAA 
751 GATGGAGAGC ATCCTGCAGG 
50 flOl CAGTGAACAT CAAGGCAGGC 

S51 ATCCAGATGC GAAGACCCTT 
TDl TATCCCAGGC TTCTGGGTCA 
^51 TTTTGATCAA CCGACGTGAT 
1001 CAGGTACAGG ATCTCAGACA 
55 1051 CTTCCAGACT AACCCCTACT 
1101 AGCGCA ACCG CTCAGGCCGG 
1151 CACCGGGGCC AGGA ACCCCA 
1201 CCACAGCTTT TTCAGCTGGT 



TCTCCTCAGC TCTGCTTACC GGTGCG ACTA 

AGCG AAGGGG CGAGTGCGAG TCCCCTGAGC 

GGACCGCCCA GATGAGGGGC CTCCGGCCAA 

CCGAGTCTCC ACAGCGCG AC CCGCCCCCGC 

CTCCGACTGC CGCTGCCTCC ACCCCAGCAG 

AACGGAGGCG GCACAGGTGC TGGCCGATAT 

CCGCGCTGCC CCCGCCGCCT CCCTATGTCA 

CGCGCAT ACT TCACGCTCGG TGCTGAGTGT 

CGAGTCGGGG TATGGGGAGG CGCCCCCGCC 

TCCCCACTCC TGAGGCCTCG GGGGGGAGCC 

GTACAGTCGA GCAGTTTTGG TGGAGAGGGG 

AGTGGGGTGG GCGCCCCAGA GGTTAGTTGA 

CGATC ATCAT AGTGGAGGAT GAGGATGAGG 

AGCAGCAGGA GGCGGCGGCG GCGGCGGAGG 

GAGGGAAAGC AGAGAGAGAA ATGCCGAGAG 

CACTGGAGGA TATTCAGCTG GATCTGGAGG 

AAAGCCTTCC TGCGTCTCAA GCGCAAGTTC 

CCTGGAGCGC AGAGACCTCA TCATCCAGCA 

AAGCATTCCT CAACCACCCC AGAATTTCAA 

GAAGACATTT TCCGCTACTT GACCAATCTG 

TATCTCCATG GGCTACAAAA TGAAGCTGTA 

TCACAA ACAT GGTGATTGTC AAGGAGTTCC 

CTGGTGTCTC ACTCAACCCC AATCCGCTGG 

GGCCCGTCGT CACGGGAACC AGGATGCGAG 

TCTC A A ACCA TAGCCTCCCA GAGGCTGACA 
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1251 GGATTGCTGA GATT ATC A AG A ATGATCTGT GGGTTAACCC TCTACGCT AC 

13D1 TACCTGAGAG AAAGGGGCTC CAGGATAAAG AGAAAGAAGC AAGAAATG A A 

1351 GA A ACGTAA A ACC AGGGGCA GATGTG AGGT GGTGATCATG GAAGACGCCC 

1M01 CTGACTATTA TGCAGTGGAA GACATTTTCA GCGAGATCTC AGACATTGAT 

5 1M51 GAGACAATTC ATG ACATC A A GATCTCTGAC TTCATGG AGA CCACCGACTA 

15D1 CTTCGAGACC ACTGACAATG AGATAACTG A CATC AATGAG A ACATCTGCG 

1551 ACAGCG AGAA TCCTGACCAC A ATGAGGTCC CCAACAACGA G ACCACTG AT 

lt,01 AACAACGAGA GTGCTGATG A CCACGAAACC ACTGACAACA ATGAGAGTGC 

lb51 AGATGACAAC AACGAGAATC CTGAAG ACAA TAACAAGAAC ACTGATGACA 

10 17D1 ACGAAGAGAA CCCTAACAAC AACGAGAACA CTTACGGCAA CAACTTCTTC 

1751 AAAGGTGGCT TCTGGGGC AG CCATGGCA AC AACCAGGACA GCAGCGACAG 

IfiDl TGACAATGAA GC AG ATG AGG CCAGTGATGA TGAAGATAAT GATGGCAACG 

1B51 AAGGTGACAA TGAGGGCAGT GATGATGATG GCAATGAAGG TGACAATGAA 

1101 GGCAGCGATG ATGACGAC AG AGACATTGAG TACTATGAGA AAGTTATTGA 

15 1^51 AGACTTTGAC AAGGATCAGG CTGACTACGA GGACGTGATA GAGATCATCT 

2D01 CAGACGAATC AGTGGAAGAA GAGGGCATTG AGGAAGGCAT CCAGCA AGAT 

2D51 GAGGACATCT ATGAGGA AGG AAACTATGAG GAGGAAGGAA GTGAAGATGT 

E1D1 CTGGG AAGA A GGGG A AG ATT CGGACGACTC TGACCT AGAG GATGTGCTTC 

2151 AGGTCCCAAA CGGTTGGGCC A ATCCGGGGA AGAGGGGGAA AACCGGATAA 

20 2201 GGGTTTTCCC CTTTTGGGGA TCACCTCTCT GTATCCCCCA CCCACTATCC 

2251 CATTTGCCCT CCTCCTCAGC TAGGGCCACG CGGCCCCACA TTGCACTTCT 

23D1 GGGGGGTGAC CGACTTCGTA CACGGGTTTA A AGTTTATTT TTATGGTTTA 

2351 GTCATTGCAG AGTTCTTATT TTGGGGGGAG GGAAAGGGGG CTAGTCCCCT 

2M01 TCTTTTGGCC CTCCGCCCCC GCAGGCTTCT GTGTGCTGCT AACTGTATTT 

25 2M51 ATTGTGATGC CTTGGTCAGG GCCCCTCTAC CCACTTCTCC CAGTCAGTTG 

25D1 TGGCCCCAGC CCCTCTCCCT GTGCTGTGTG GAGTGGACAC CCTGACCCCC 

2551 GAAGCGGGGA GGGCCGCTGT GGCCTTCGTC ACAGCCGCGC AGTGCCCATG 

2bDl GAGGCGCTGC TGCCACCTTC CTCTCCCAAG TTCTTTCTCC ATCCCTCTCC 

2b51 TCTTCCCGCC GCGCCGCT AG CCCGCCTCGG TGTCTATGCA AGGCCGCTTC 

30 2701 GCCATTGCGG TATTCTTTGC GGTATTCTTG TCCCCGTCCC CCAGAAGGCT 

2751 CGCCTCTCCC CGTGGACCCT GTTAATCCCA ATAAA ATTCT GAGCAAGTTT 

2601 AAAAA A AAAA AAA A AAA A A 

35 BLAST Results 

No BLAST result 

riedline entries 
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^fi3 c i c 3filDM: 

45 Vogel Tn Dittrich Oi Nehraein Yi Dechend Schnieders F-t 
Schmidtke 

J. iMurine and human TSPYL genes: novel members of the 
TSPY-SET-NAP1L1 family- Cytogenet Cell Genet =. fil < 3-M ) : 2b5-7D 



Peptide information for frame 2 



ORF from in bp to 21^7 bp=i peptide length: t»T3 
Category: similarity to known protein 
Classification: unclassified 
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10 



5 



1 NDRPDEGPPA KTRRLSSSES PCRDPPPPPP PPPLLRLPLP PPGCRPRLdE 

51 ETEA AC3VL AD PIRGVGLGPAL PPPPPYVILE EGGIRAYFTL GAECPGUDST 

101 IESGYGEAPP PTESLE ALPT PEASGGSLEI DF<2VV(3SSSF GGEGALETCS 

151 AVGUAPGRLV DPKSKEEAII IVEDEDEDER ESNRSSRRRR RRRRRKC2RK V 

EDI KRESRERNAE RNESILdALE DIflLDLEAVN IK AGKAFLRL KRKFICMRRP 

251 FLERRDLIIC3 HIPGFUVKAF LNHPRISILI NRRDEDIFRY LTNLflVGDLR 

301 HISIIGYKIIKL YFdTNPYFTN MVIVKEFflRN RSGRLVSHST PIRWHRGflEP 

351 C2 ARRHGNtfDA SHSFFSUFSN HSLPE ADRI A EIIKNDLUVN PLRYYLRERG 

MD1 SRIKRKKtfEfl KKRKTRGRCE VVIHEDAPD Y YAVEDIFSEI SDIDETIHDI 

M51 KISDFflETTD YFETTDNEIT DINENICDSE NPDHNEVPNN ETTDNNESAD 

501 DHETTDNNES ADDNNENPED NNKNTDDNEE NPNNNENTYG NNFFKGGFUG 

551 SHGNNdDSSD SDNE ADEASD DEDNDGNEGD NEGSDDDGNE GDNEGSDDDD 

bDl RDIEYYEKVI EDFDKDflADY EDVIEIISDE SVEEEGIEEG ItfdDEDIYEE 

b51 GNYEEEGSED VWEEGEDSDD SDLEDVLtfVP NGUANPGKRG KTG 



BLASTP hits 



20 



No BLASTP hits available 



Alert BLASTP hits for DKFZphamy2_7 j5 ^ frame 2 



25 TREHBL: ABD153M5_1 gene: "HRIHFB221L" ^ Homo sapiens HRIHFB221fc> 



partial cds--* N = 4-« Score = 13^3-i P = 2.1e-lb5 

TREMBL : HSD JMfitI3_2 gene: "d J MflL.13 - 2" V product: n dJMfibI3-2 
30 (KIAA0721 

(NAP (Nucleosome Assembly Protein) domain containg protein) ) n n 



DNA sequence from clone MflbIS on chromosome bq22 • 1-22 • 3 • Contains 
the 

35 part of a gene for a novel protein-, the gene for KIAAD721 (NAP 

(Nucleosome Assembly Protein) domain containg protein)i the TSPYL 
gene 

for TSPY-like (testis specific protein-i Y-linked like)-i and an 
RPS5 

40 (MDS Ribosomal Protein S5) pseudogene- Contains ESTsn STSs-i GSSs 
and 

two putative CpG islands. -i N = 1 «. Score = 57D-1 P = 3.*4e-55 



45 >TREMBL:AB0153L45_1 gene: "HRIHFB221b" Homo sapiens HRIHFB221L 
mRNA-i 

partial cds- 

Length = Mfit. 

50 HSPs: 

Score = 13=33 ( 2DT • D bits)-, Expect = 2.1e-lb5-, Sum P(H) = 2-le- 
lb5 

Identities = 2bfl/2^S (ID*)-, Positives = 2t 3 fi/2 c JS 

55 

(3uery: 205 

NAER(1ESIL(3ALEDI(3Ll)LEAVNIKAGKAFLRLKRKFIt3riRRPFLERRDLII(3HIPGFLjV 2b7 



mRNAi 



Human 
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NAERI1ESIL(3ALEDI(3LDLEAVNIKAGKAFLRLKRKFI(3l1RRPFLERRDLII(3HIPGFlilV 
Sbjct: 1 

NAERHESIL(3ALEDIt3LDLEAVNIKA6KAFLRLKRKFI(3nRRPFLERRDLII(3HIPGFUV bO 

5 

Query: Bbfl 

KAFLNHPRISILINRRDEDIFRYLTNL(3V(3DLRHISri6YKnKLYFi3TNPYFTNnVIVKEF 327 

KAFLNHPRISILINRRDEDIFRYLTNL(3VC3]>LRHISnGYKriKLYF(2TNPYFTNriVIVKEF 
10 Sbjct: bl 

KAFLNHPRISILINRRI>EI>IFRYLTNL(3Vc3I)LRHISHGYKnKLYF(3TNPYFTNriVIVICEF 120 
<2uery: 32B 

(3RNRSGRLVSHSTPIRblHRGdJEP<3ARRHGNt3DAXXXXXXXXXXXXLPEADRIAEIIKNDL 3fl7 
15 <3RNRSGRLVSHSTPIRUHRG(3EP<2ARRHGNi3DA 
LPEADRIAEIIKNDL 
Sbjct: 1E1 

(3RNRSGRLVSHSTPIRUHRG(3EP(3ARRHGN(3PASHSFFSUFSNHSLPEADRIAEIIKN1)L IflO 



20 fluery: 3fifi 

WVNPLRYYLRERGSXXXXXXXXXXXXXXXGRCEVVIflEDAPDYYAVEDIFSEISDIDETI 4 47 

UVNPLRYYLRERGS 
GRCEVVIMEDAPDYYAVEDIFSEISDIDETI 
Sbjct: Ifil 

25 UVNPLRYYLRERGSRIICRKK(2EnKKRKTRGRCEVVIf1EDAPI>YYAVEDIFSEISDIDETI 240 



(Juery: 44fl 

HDIKISDFflETTDYFETTPNEITDINENICDSENPDHNEVPNNETTDNNESADDH 5D2 

30 HDIKISDFHETTDYFETTDNEITDINENICDSENPDHNEVPNNETTDNNESADDH 
Sbjct: E41 

HDIKISDFHETTDYFETTDNEITDINENICDSENPDHNEVPNNETTDNNESADDH 2=15 

Score = 117 (17. b bits)-, Expect = ^-De-IT-, Sum P<4) = T-De-n 
35 Identities = 32/77 (Ml*)-, Positives = MM/77 (572) 



<2uery: 42b 

DAPDYYAVEDIFSEISDIDETIHDIKISPFMETTBYFETTDNEITDINENICDSENPDHN 4flS 
+ DY+ D +EI+DI+E ID E D+ E +NE TD NE+ 

40 D E D+N 

Sbjct: BSD ETTDYFETTD--NEITDINENICD 

SENPDHNEVPNNETTDNNESADDHETTDNN 3D1 



(3uery: 4flb EVP--NNETT-DNNESADDH 5D2 
45 E NNE DNN++ DD + 

Sbjct: 302 ESADDNNENPEDNNKNTDDN 321 



50 



Score = T4 (14.1 bits)-. Expect = 2-le-lb5-, Sum P(4) = 2-le-lbS 
Identities = lb/lb (!□□%) -i Positives = lb/lb (100/C) 

fluery: b7fl (2VPNGWANPGKRGKTG b^3 

CVPNGtiJANPGKRGKTG 
Sbjct: 471 tfVPNGUANPGKRGKTG 4flb 

55 Score = (13-5 bits)-. Expect = LTe-lb-. Sum P(4) = T-Te-lb 
Identities = 34/35 ( 402 ) i Positives = 45/flS (52V.) 
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Query : MEb D APDYYA VEDIFSEISDIDETIHDIKISDFI1E TTDYFETTDN- 

EITDINENICDS M?^ 

+ DY + D +EI+DI+E I D + D E TTD E+ D + E TD 

NE + D + 

5 Sbjct: BSD ETTDYFETTD-- 

NEITDINENICDSENPDHNEVPNNETTDNNESADDHETTDNNESADDN 3D7 

Query: MAO -ENPDHN-- -E VPNN-ETTDNN MTb 

ENP+ N E PNN E T N 

10 Sbjct: 3Dfi NENPEDNNKNTDDNEENPNNNENTYGN 33M 

Score = fl? (13-1 bits)-, Expect = E-le-lbS-, Sum PCM) = E.le-lbS 
Identities = 1M/1M C10DS)-. Positives = 1M/1M ODD*) 

15 Query: 5M3 FFKGGFb)GSHGNNi2 S5b 

FFKGGFUGSHGNNfl 
Sbjct: 33b FFKGGFWGSHGNNfl 3M^ 

Score = fi5 C1E-6 bits)-. Expect = £.le-lb5-, Sum PCM) = E.le-lbS 
20 Identities = Ib/lfi Cflfi*)., Positives = 17/16 CTM*/.) 

tfuery: bDl RDIEYYEKVIEDFDKDC3A blfl 

RDIEYYEK IEDFD + D(2A 
Sbjct: 3TM RDIEYYEKGIEDFDRDflA Mil 

Score = bO • □ bits)-, Expect = 5.3e-D3-. Sum PCM) = 5-3e-03 
Identities = El/bb C31Z)-, Positives = 33/bb (5D*) 



25 



Query: MEb D APDYYA VEDIFSEISDI DETIHD-IKIS- 
30 DFMETTDYFETTDNEITDINENICDSENPD Mfi3 

D BY V +1 S + S + E I + 1+ D E +Y E ++ + E + 

DS+ D 

Sbjct: MO^ 

DflADYEDVIEIISDESVEEEGIEEGItffiDEDIYEEGNYEEEGSEDVWEEGEDSDDSDLED Mbfi 

35 _ _ . .. 

Query: MflM HNEVPN MBT 
+ VPN 

Sbjct: Mb^ VLdVPN M7M 

40 Score = M^ C7-M bits)-, Expect = LMe-Ob-, Sum PCM) = 1-Me-Db 
Identities = IE/35 C3MZ)-, Positives = El/35 CbO*) 

Query: Mb3 ETTDNEITDINENICDSENPDHNEVPNNETTDNNE MT7 
E + D+E D NE + + D NE +NE + D + + + 
45 Sbjct: 3b0 EASDDEDNDGNEGDNEGSDDDGNE-GDNEGSDDDD 3^3 

Score = ME Cb-3 bits)-, Expect = 7.Ee-Db-. Sum PCM) = 7-Ee-Ob 
Identities = 11/37 CET/S)-, Positives = lfi/37 CMfiX) 

50 <3uery: Mb5 TDNEITDINENICDSENPDHNEVPNNETTDNNESADD 501 

+ DNE + + D E+ D NE N + D + D+ 
Sbjct: 3SM SDNEADEAS -DDEDNDGNEGDNEGSDDDGNEGDN 3flb 

55 Pedant information for DKFZphamyE_7 jS -» frame E 



Report for DKFZphamyE_7 j 5 - E 
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[LENGTH!! ^3 

EMbO 7TM3S-D7 

5 IpU M.MS 

CHOriOLJ TREflBL: AB0153M5_1 gene: n HRIHFB221L>" n Homo sapiens 

HRIHFB221b mRNA-. partial cds- le-171 

EFUNCATID Db-10 assembly of protein complexes ES- cerevisiae-. 

YKRDMficJ Me-DS 

10 EFUNCAO 03-22 cell cycle control and mitosis ES. cerevisiae-. 

YKRDMflc J Me-D5 

EFUNCATI 03 • DM budding-, cell polarity and filament formation 

ES. cerevisiae-. YKROMflcJ Me-D5 

EFUNCATID OT-13 biogenesis of chromosome structure ES. 
15 cerevisiae-. YKRDMficH He-D5 

EFUNCAT3 3D-1D nuclear organization ES. cerevisiaei YKROMficID 
Me-05 

EBLOCKSJ BPD2tMbH 

EBLOCKSJ BP02b4bE 

20 EBLOCKSID PFD0M24 A 

EBLOCKSJ BLD0415N Synapsins proteins 

EBLOCKSJ BPD27^TE 

CBL0CKS3 BLDOOMfi Protamine PI proteins 

EBLOCKSJ PRODDED 

25 EBLOCKSH) PFDDTSbD 

EBL0CKS2 PFDtmbC 

EBL0CKS3 PFDDTSbB 

EPIRKIO nucleus fie-33 

EPIRKbD phosphoprotein fie-33 

30 EPIRKIO alternative splicing fie-33 

C KU 3 Alpha_Beta 

EKIO LOW_COriPLEXITY 35-35 * 

35 SE<3 HDRPDEGPPAICTRRLSSSESR(2RDPPPPPPPPPLLRLPLPPP(3(3RPRL(3EETEAAC3VLA1> 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SE(2 nRGVGLGPALPPPPPYVILEEGGIRAYFTLGAECPGUDSTIESGYGEAPPPTESLEALPT 

40 SEG - .--xxxxxxxxxxx 

PRD ccccceeeeccccccccccccccceeeeccccccccccceeecccccccccchhhhhhhh 

SEC PEASGGSLEIDF(3VV(2SSSFGGEGALETCSAVGtoAPl3RLVDPKSKEEAIIIVEDEDEDER 

SEG xxxxxxxx 

45 PRD hcccccccccceeeeecccccchhhhhhhhccccccccccccchhhhhhhhhhhhhhhhh 

SEC ESNRSSRRRRRRRRRK(2RKVKRESRERNAERnESIL(2ALEDI(3LDLEAVNIKAGKAFLRL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhh 

50 

SE(2 KRKFl£3MRRPFLERRDLIIi3HIPGFlilVKAFLNHPRISILINRRDEDIFRYLTNLr3V(2DLR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhccccceeeccccccchhh^ 

55 SEC2 HISnGYKf1KLYF(3TNPYFTNI1VIVKEF(3RNRSGRLVSHSTPIRUHRG(3EPi3ARRHGN(3DA 

SEG 

PRD cccccceeeeeeccccccchhhhhhhcccccccceeeccccccccccccccccccccccc 
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SE<3 SHSFFSUFSNHSLPEADRIAEIIKNDLL)VNPLRYYLRERGSRIKRKK(3EriKKRKTRGRCE 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxxx • . • . 

PRD ccccGGGCCccccccchhhhhhhhhhhhcccchhhhhhhhhhhhhhhcceeeeecccccc 

5 SE(3 VViriEDAPDYYAVEPIFSEISDIDETIHDIKISDFnETTDYFETTDNEITDINENICDSE 

SEG 

PRD eeeccccccceeehhhhhhhhhhccccccceeeGGCcccccccccchhhhhhhhcccccc 

SEfl NPDHNEVPNNETTDNNESADDHETTDNNESADDNNENPEDNNKNTDDNEENPNNNENTYG 

10 SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccceeGGCccccccccccccccccchhhhhcccccceeeeeccccccccccccccc 

SEt3 NNFFKGGFUGSHGNNtfDSSDSDNEADEASDDEDNDGNEGDNEGSDDDGNEGDNEGSDDDD 

SEG XX xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

15 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SE<2 RDIEYYEKVIEDFDKDi3ADYEDVIEIISDESVEEEGIEEGI(3t2DEDIYEEGNYEEEGSED 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhhcccccchhhhheeeecccccccccccccccccceeecccccccccce 



20 



25 



30 



35 



SEt2 VUEEGEDSDDSDLEDVL<3VPNGWANPGKRGKTG 

SEG xxxxxxxxxxxxxxxxx 

PRD eeecccccccccceeeeeccccccccccccccc 



(No Prosite data available for DKFZphamy5_7 j5 - 2 ) 
(No Pfam data available for DKFZphamy2_7 j5 . S ) 



Pedant information for DKFZphamy2_7 j 5 frame 3 
Report for DKFZphamy2_7 jS - 3 



ELENGTH J 15D 

emio ibfiio-bT 

EpIJ IB-fifi 
40 EBLOCKSJ PRDD3DAA 
EKW3 All_Alpha 

EKLO L0W_C0MPLEXITY bl-33 X 

45 SE(2 MRTSATARILTTnRSPTTRPLITTRVLf1TTKPLTTf1RV(3nTTTRILKTITRTLnTTKRTL 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhccccccccccGGeeccccccchhhhhhhhhhhhhhhhhhhccccccc 

SE(3 TTTRTLTATTSSKVASGAAI1ATTRTAATVTI1K<3nRPVnriKIMI1ATKVTriRAVfirinAI1KVT 

50 SEG xxxxxxxxxx- ..... . xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx 

PRD ccccceeecccccccchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 

SEC riKAAMMTTETLSTriRKLLKTLTRIRLTTRT 

SEG xxxxxxxx- xxxxxxxxxxxxxxxxxxxxx 

55 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhccc 



(No Prosite data available for DKFZphamyE__ 7 j 5 - 3 ) 
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(No Pfam data available for DKFZphamyE_7 j5 - 3 ) 



DKFZphf br2_7flc!2 



group: nucleic acid management 

10 DKFZphf br2_76cl2 encodes a novel 52& amino acid protein with high 
csimilarity to glutamyl-tRNA (Gin) ami dotransf erase subunit A of 
the hyperthermophi 1 ic bacterium Aquifex aeolicus* 



The novel protein contains one ATP/GTP-binding site motif A (P- 
15 loop)- This loop interacts with one of the phosphate groups of a 
A or G nucleotide. It is found in numerous ATP- or GTP-binding 
proteins-i such as ATP synthase alpha and beta subunits-. Myosin 
heavy chainsi Kinesin heavy chains and kinesin-like proteinsi 
Dynamins and dynamin-like proteins-i several kinases-i DNA and RNA 
20 helicases-i GTP-binding elongation factors and the Ras family of 
GTP-binding proteins- The protein seems to be expressed 
ubiquitously. 

The new protein can find application in the modulation of 
25 translational pathways- 



similarity to glutamyl-tRNA (Gin) ami dotransf erase subunit A 
(Aquifex 
30 aeolicus) 

Sequenced by MediGenomix 

Locus: /map = "bflb.3 cR from top of Chrb linkage group* 1 

35 

Insert length: 32MH bp 

Poly A stretch at pos- 3222n polyadeny lat ion signal at pos- 32D4 



40 1 
51 
101 
151 
201 

45 251 
301 
351 
L401 
451 

50 501 
551 
bOl 
b51 
701 

55 751 
flOl 
flSl 
S01 



AGTGACAATT 
ACCTCTTTTT 
GACCATGCTG 
GCCAAATTAC 
AAGGCCAAGT 
AAAAC AAGCT 
GGGATTTAGA 
GGCATTGAGA 
TTATAATGCT 
TGGGAAAAAC 
GGTGTATTTG 
AGAAAAGAGG 
TGATAACTGG 
ACATGCTACG 
TGCTGCCCAC 
CCCGTCATGG 
TTAACCAGAT 
ACCTGACCCC 
TCATGCTTCC 



AAAGATGGCT 
CCCCCTTGCC 
GGCCGGAGCC 
ACCAACAGAG 
TTCTAAATGC 
GAAGAATCAG 
TGGAATTCCT 
CAACATGTGC 
ACAGTAGTTC 
AAATTTAGAT 
GACCAGTTAA 
AAGCAGAATC 
AGGAAGCCCA 
CGGCTTTAGG 
TGTGGGCTTG 
TCTCATTCCC 
GTGTGGATGA 
AGGGACTCTA 
CAGTTTGGCA 



GCGCCCATGT 
TGGCTCCTGT 
TCCGAG AAGT 
CTCTGTC A A A 
CTAC ATTACT 
AAAAG AG ATA 
ATTGC AGTA A 
ATCAA ATATG 
AGAAGTTGTT 
GAGTTTGCTA 
AAACCCCTGG 
CCCACAGCGA 
GGTGGGAGTG 
ATCAGAT ACA 
TTGGTTTCAA 
CTGGTG AATT 
TGCAGC AATT 
CCACAGTACA 
GATGTG AGC A 



AACATCACTA 
GGTGGCAGGC 
TTCTGCGGCA 
AATGTCTCTC 
GTGTCAGAAG 
TAAGAATGGA 
AAGACAATTT 
CTGAAAGGTT 
GGATCAGGGA 
TGGGATCTGG 
AGTT ATTCAA 
GAATGAAGAT 
CAGCTGCTGT 
GGAGGATCGA 
ACCAAGCTAT 
CGATGGATGT 
GTGTTGGGTG 
TGAACCT ATT 
AACTATGT AT 



GCGACCGGTG 
TGGGCACGAG 
CTGAA ACA AG 
TCTTATCA AG 
AGGTGGCCTT 
CAGTCACTTG 
CAGCACTTCT 
ATATACCACC 
GCTCTACTAA 
GAGCACAGAT 
AACGATATAG 
TCAGACTGGC 
ATCGGCGTTC 
CCAGAAATCC 
GGCTTAGTTT 
GCCAGGAATC 
CACTGGCCGG 
AATA AACCAT 
AGGAATTCCA 
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T51 A AGG A AT ATC TTGTACCGGA ATTATC AAGT GAAGTAC AGT CTCTTTGGTC 
1001 C A A AGCTGCT GACCTCTTTG AGTCTGAGGG GGCCAA AGTA ATTGAAGTAT 
1051 CCCTTCCTCA CACCAGTT AT TCAATTGTCT GCTACCATGT ATTGTGCACA 
11D1 TCAGA AGTGG CATCGAATAT GGCAAGATTT GATGGGCTAC AATATGGTCA 
1151 CAGATGTGAC ATTGATGTGT CCACTGAAGC CATGTATGCT GCAACCAGAC 
1501 GAGAAGGATT TAATGATGTG GTGAGAGGAA GAATTCTCTC AGGAAACTTT 
1E51 TTCTT ATTAA AAGAAAACTA TGAA AATTAT TTTGTC A A AG CACAGAAAGT 
1301 GAGACGCCTC ATTGCTAATG ACTTTGTAAA TGCTTTT AAC TCTGGAGTAG 
1351 ATGTCTTGCT AACTCCCACC ACCTTGAGTG AGGCAGTACC AT'ACTTGGAG 
1M01 TTCATC AAAG AGGACAACAG AACCCG AAGT GCCCAGGATG ATATTTTTAC 
mSl AC AAGCTGTA AATATGGC AG GATTGCCAGC AGTGAGTATC CCTGTTGCAC 
1501 TCTCA A ACCA AGGGTTGCC A ATAGGACTGC AGTTTATTGG ACGTGCGTTT 
1551 TGTGACCAGC AGCTTCTTAC AGTAGCCAAA TGGTTTGAAA AACAAGTACA 
lt.01 GTTTCCTGTT ATTCAACTTC AAGAACTCAT GGATGATTGT TCAGCAGTCC 
1L.51 TTGAA A ATGA AAAGTTAGCC TCTGTCTCTC TAA AAC AGTA AACATATCTT 
1701 ACAAATTAAA ATGACTTTTA GGCTGGGTGC AGTGGCTCAC ACCTGTAATC 
1751 CCAGCACTTT GGGAGGCCAA GGCGAGCGGA TCATGAGGTC AGAAGATCTA 
IfiOl GAACAGCCTG GTCAACATGG TGAA ACCCCG TCTCTACTAA AAATACAAAA 
lfi51 ATTAGCCAGG CTTAGTGGCG GGCATCTGTA GTCCCAGCTA CTCAGGAGGC 
1101 TGAGGCAGGA GAATCACTTG AACCCTGGAG GTGGAGGTTG CAGTGAGCCG 
1^51 AG ATC ATGCC ACTGCACTGC ACTCCAGCCT GGGTGACAAA GCAAGACTGT 
E001 GTCTC A AAAT AAATAAATAA AATAAAATAA AATGACGTAC AGAGATTCTA 
E051 TATTCTAGAG AGTCAAATGG TCTTGCTCAA TTCTTGTAAT TAGGTTCTTG 
E101 TTAAT ACAGT CATTCCATGG AATTACTTTT TAA AATTCCT GTGACAATTA 
S151 ATAATAAATA ACGTGTCAGC ATTT AGTAAG CATCCACTAA GTGTACAATA 
EED1 CTTCTACAAT AACACAAGAT ACCTGTTCCT CAAAGACAAT GCATTCTGCC 
EE51 ATAATGTTCA TTAAAGAGTT TACAGTAAAA ATAAGATTAG GGATAAACTT 
E301 CTCAA A AATT GTACATCTGT GTAACTAAAG CACTAAC AAA AACATGAATA 
E351 GTCCTTCTAG AGGTA ACTTG GATAGCCTAG GCAGGC A ACT TATCATGTGG 
E^tOl TGAAGGCCGC CTCAGGGGTT GTTAAAAATG CACAGAAACA ATTGAGTGCG 
EM51 ATTATTGGCT TCTGAGCGCT GAGCAGAGCA GGTGGAAGAG GAACTTTGAG 
E5D1 CACAGG AGGA AATGCAACC A GTCAGGGCCC AGAATCATGC A A ATCTCAGG 
E551 GGTATGCCTC TCTGGGGAGG AGCTCCACTT GCAGGGACTC CTTTTATTTC 
EbOl CCTAAG AAAG AGCTGAAATG ACTGAGAACT TTCCTTTCCT CCTTAGAGTT 
Et51 ACAATTTTAC TTCTGCTATT CCGGAGCCCA TGCCTAGAAG CCAGA ACAAC 
E701 TCCATGTTAC ACTGAGTTCA TGCTCCTATT TACTGATCAC A AATGAGCTC 
E751 ATTAATGTCA TCGAAACATT TATTGTAACC TAACAGACCA TCACAGATTG 
EflOl GAAACTTGGT AGATAGCAGA GCATGGTATT AGTGAA AAAG GTTCA AAATA 
£fl51 CACAAGTAAC ATACACTCTG AAAAACATGC AGATAATTTG CTGATGAAGC 
E^Dl AGAAGAGGGG ATGCGCATGG CAAGAACTTG CCTTACCCCA GATTCTCTAT 
ETSl ATCTC ATGGT TTCCTTTTCC TCTTGACTGT CTTTACG AGT GTTTTTTATT 
3001 TGGGACCCTC GAGCCCAGAG ATATTA ATGG AT ATCTGTAT TCAATATTTG 
3051 ACAAAATCTA ATGGAAACC A TCCATTTACT CATGATAAGG CTTCATCACT 
3101 GGATTTCTGT GTCTTCACTA GAACACCATT GTCATCTCAT ATTGATCAGG 
3151 TATTTTAATC TAGCACTTAC ATATTGTTGA TAAATGA AAG CTGAATTGTT 
3ED1 ACTTA ATAAA TTCACTTTGT TTAGCA AAAA AAAAAA A A AA A AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 3 
5 

ORF from 105 bp to Ibflfi bp, peptide length: 52fl 
Category: similarity to known protein 
Classification: Protein management 
10 Prosite motifs: ATP_GTP_A (HE-llT) 



1 MLGRSLREVS 

51 (3 AEESEKRYK 

15 101 NATVVtfKLLD 

151 KRKdNPHSEN 

201 AHCGLVGFKP 

251 DPRDSTTVHE 

3D1 A ADLFESEGA 

20 351 CDIDVSTEAN 

M01 RLIANDFVNA 

M51 A VNNAGLPA V 

501 PVK3L(3ELI1P 



AALKC2GC3ITP 
NGCSLGDLDG 
tfGALLMGKTN 
EDSDWLITGG 
SYGLVSRHGL 
PINKPFP1LPS 
KVIEVSLPHT 
YAATRREGFN 
FNSGVDVLLT 
SIPVALSNt2G 
DCSAVLENEK 



TELCflKCLSL 
IPIAVKDNFS 
LDEFAMGSGS 
SPGGSAAAVS 
IPLVNSMDVP 
LADVSKLCIG 
SYSIVCYHVL 
DVVRGRILSG 
PTTLSEAVPY 
LPIGLflFIGR 
LASVSLKfi 



IKK AKFLNA Y 
TSGIETTCAS 
TDGVFGPVKN 
AFTCYAALGS 
GILTRCVDDA 
IPKEYLVPEL 
CTSEVASNMA 
NFFLLKEN YE 
LEFIKEDNRT 
AFCD(3(3LLTV 



ITVSEE VALK 
NMLKGYIPPY 
PUSYSKRYRE 
DTGGSTRNPA 
AIVLGALAGP 
SSEVQSLWSK 
RFDGLtfYGHR 
NYFVKAtfKVR 
RSA<2DDIFT<3 
AKWFEK<2V(3F 



25 



BLASTP hits 



30 



No BLASTP hits available 



Alert BLASTP hits for DKFZphf br2_?flcl2 , frame 3 



PIR:F70322 gl utamyl - tRNA (Gin) ami dotrans f erase subunit A - 
Aquif ex 

35 aeolicus-i N = 2-» Score = L.2D-, P = H-3e-fl^ 



>PIR:F70322 glut amy 1 -tRNA (Gin) ami dotransf erase subunit A - 
Aquif ex 
40 aeolicus 

Length = M7fl 

HSPs: 

45 Score = b20 (T3-0 bits). Expect = M-Se-fl^n Sum P(2) = M-Se-fi^ 
Identities = 135/31T Positives = 115/311 (blX) 

fluery: 167 

ALGSDTGGSTRNPAAHCGLVGFKPSYGLVSRHGLIPLVNSHDVPGILTRCVDDAAIVLGA 2Mb 
50 +LGSDTGGS R PA+ CG++G KP + YG VSR+GL+ + S+D G+ R 

+D A+VL 
Sbjct: lb3 

SLGSDTGGSIR<2PASFCGVIGIKPTYGRVSRYGLVAFASSLDt2IGVFGRRTE])VALVLEV 222 
55 Query: EH7 

LAGPDPRDSTTVHEPINKPFnLPSLADVSKLCIGIPKEYLVPELSSEVt3SLUSKAADLFE 3Db 
++G D +DST+ P+ + + +V L IG+PKE+ EL +V+ + 

E 
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Sbjct: 2S3 ISGUDEKDSTSAKVPVPE- 
LJSEEVKKEVKGLKIGLPKEFFEYELOPCVKEAFENFIKELE Efll 

Query'' 307 

5 SEGAKVIEVSLPHTSYSIVCYHVLCTSEVASNMARFDGLfiYGHRCDIDVSTEAflYAATRR 3bb 

EG ++ EVSLPH YSI Y+++ SE +SN+ AR+DG++ YG+R 

NYA TR 

Sbjct: 2fl2 

KEGFEIKEVSLPHVKYSIPTYYIIAPSEASSNLARYDGVRYGYRAKEYKDIFEPIYARTRD 341 

10 

Query* 3b7 

EGFNDVVRGRILSGNFFLLKENYENYFVKAiSKVRRLIANDF VNAFNSGVDVLLTPTTLSE 4Eb 
EGF V+ RI+ G F L Y+ Y + + K AtfKVRRLI NDF + AF VDV+ 

+ PTT 

15 Sbjct: 34E EGFGPEVKRRII1LGTFALSAG YYD A YYLK AdKVRRLITNDFLK AFEE- 
VDVIASPTT--P 3=1<3 

Query- 427 

AVPYLEFIKEDNRTRSA(3DDIFT(3AVNriAGLPAVSIPVALSN(3GLPIGLfiFIGRAFCDt3(3 4&b 
20 +P+ + +N DI T N+AGLPA+SIP+A + GLP + G Q 

IG+ + + 

Sbjct: -3=5=1 TLPFKFGERLENPIEriYLSDILTVPANLAGLPAISIPIAUKD- 
GLPVGGflLIGKHWDETT 457 

25 tfuery: 4S7 LLTVAK-WFEKGVgFPVIGL 505 

LL ++ lil +K + I L 
Sbjct: 45fl LLflISYLUE(3KFKHYEKIPL 477 

Score = EAT (43-4 bits)-, Expect = 4-36-6=5-, Sum P(E) = 4.3e-fi=) 
30 Identities = b4/143 (445c)-, Positives = =30/143 (b2*) 

Query: 4 RSLREVSAALKflG(2ITPTELCt3KCLSLIKKAKF- 

LNAYITVSEEVALK<3AEESEKRYKNG bE 

+SL E + LK+G+++P E+ + + + + AYIT ALKtfAE 

35 + + R 

Sbjct: 5 

KSLSELRELLKRGEVSPKE VVESFYDRYN(2TEEK VKAYITPLYGKALK(2AESLKER bQ 

fluery: b3 

40 (3SLGI)LDGIPIAVKDNFSTSGIETTCASNriLKGYIPPYNATVV(3lCLL]>(3GALLnGKTNLD 1EE 

L L GIPIAVKDN G +TTCAS +L+ ++ PY+ATV+++L 

GAL++GKTNLD 
Sbjct: bl -EL- 

PLFGIPIAVKDNILVEGEKTTCASKILENFVAPYDATVIERLKKAGALIVGKTNLD llfi 

45 

fluery: 123 EFANGSGSTDGVFGPVKNPUSYSK 14b 

EFAMGS + F P KNPU + 
Sbjct: 11^ EFAMGSSTEYSAFFPTKNPWDLER 14S 

50 

Pedant information for DKFZphf br2__7flc!2 frame 3 



Report for DKFZphf br2_7flcl2 • 3 

55 



CLENGTHJ 52fl 
EMIO 574bfi-7a 
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EpIJ 5-57 

EH0I10L3 PIR:E717E5 glutamyl -tRNA amidotransf erase chain A 

(gatA) RP15E - Rickettsia prowazekii Ee-T3 

EFUNCATJ r general function prediction CPU jannaschiin 

5 MJllfaCO fie-bl 

ICFL1NCAT3 OLOE-Ol nitrogen and sulphur utilization ES- 
cerevisiae«i YnRE^ScH le-55 

EFUNCAT3 c energy conversion EM. genitaliumn MGOnH ile-M^ 

EFUNCATID Dl-Dl-10 amino-acid degradation ES. cerevisiae-i 
10 YBR2Dac3 Be-31 

EFUNCATJ 01.03-01 purine-ribonucleotide metabolism ES- 
cerevisiae-i YBREOflcJ Ee-31 
EBL0CKS3 BLCJ0571 

EECJ t-3-M-b Urea carboxylase 5e-30 

15 EEO 3.5-m Amidase 3e-3T 

EEO 3.5-E-1E b-Aminohexanoate-cyclic-dimer hydrolase le-17 

EPIRKUJ ligase 5e-3D 

EPIRKUJ transmembrane protein 5e-30 

EPIRKWl ATP 5e-30 

20 EPIRKliO crown gall tumor le-E^ 

EPIRKliO mitochondrion Ee-13 

EPIRKliO purine nucleotide binding 5e-3D 

EPIRKUJ P-loop 5e-30 

EPIRKUJ hydrolase 36-3^ 

25 EPIRKUJ biotin 5e-3D 

ESUPFAMJ amidase 3e-3^ 

ESUPFAMJ biotin carboxylase homology 5e-3D 
ESUPFAMJ indoleacetamide hydrolase 7e- c lE 
ESUPFAMJ lipoyl/biotin-binding homology 5e-30 
30 EPR0SITEJ ATP_GTP_A 1 
EKUJ AlphaJBeta 
EKUJ L0U_C0MPLEXITY E • Mfc 

35 SE<2 riL6RSLREVSAALK(36t3ITPTELC(2KCLSLIKKAKFLNAYITVSEEVALKj3AEESEKRYK 
SEG 

PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SE(2 NG(3SLGI>LI>SIPIAVKPNFSTSCIETTCASNHLKGYIPPYNATVV(3KLLD(3GALLri6KTN 
40 SEG . 

PRD hccccccccccceeeecccccccccccchhhhhhhcccccchhhhhhhhhccceeeeccc 

SE(3 LDEFAMGSGSTDGVFGPVKNPUSYSKRYREKRKdNPHSENEDSDULITGGSPGGSAAAVS 

SEG xxxxxxxxxxxx 

45 PRD ccccccccccccccccccccccccceeecccccccccccccccccccccccccccccchh 

SEG AFTCYAALGSDTGGSTRNPAAHCGLVGFKPSYGLVSRHGLIPLVNSHDVPGILTRCVDDA 
SEG x 

PRD hhhheeeecccccccccccccceeeecccccceeeeccceeeeecccccccccchhhhhh 

50 

SE(2 AIVLGALAGPDPRDSTTVHEPINKPFMLPSLADVSKLCIGIPKEYLVPELSSEVtJSLUSK 

SEG 

PRD hhhhhhhccccccccccccccccccccccccccccceeeecccccccccchhhhhhhhhh 

55 SE(2 AADLFESEGAKVIEVSLPHTSYSIVCYHVLCTSEVASNriARFDGLcSYGHRCDIDVSTEAM 

SEG 

PRD hhhhhhhhcceeeeeeccccceeeeeeeeehhhhhhhhhhhhhcccceeeccchhhhhhh 
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SE(2 YAATRREGFNDVVRGRILSGNFFLLKENYENYFVKAtfKVRRLIANDFVNAFNSGVDVLLT 

SEC 

PRD hhhhhhcccchhhhhhhhhhheeeccccchhhhhhhhhhhhhhhh 

5 SE<3 PTTLSEAVPYLEFIKEDNRTRSA(3DI>IFT(3AVNI1AGLPAVSIPVALSN(3GLPIGL(3FIGR 

SEG 

PRD cccccccccccccccccccccccccceeeeccccccccccccccccccccccceeeeeec 

SE<2 AFCD(3(2LLTVAK:UFEK(2VC3FPVI(2Lt2ELf1DDCSAVLENEKLASVSLK(3 

10 SEG 

PRD cccchhhhhhhhhhhhhhhhhheeehhhhhhheeeeccccceeeeccc 



15 Prosite for DKFZphf br2_7flc!2 • 3 

PS00017 112->120 ATP_GTP_A PD0CD0017 

20 (No Pfam data available for DKFZphf br2_7flcl2 - 3 ) 
DKFZphfbr2_7fldlfl 



25 



45 



group: brain derived 



DKFZphf br2_7fldlfl encodes a novel 535 amino acid protein with weak 
similarity to a human putative mitogen-acti vated protein kinase 
30 kinase kinase- 

No informative BLAST results^ No predictive prositGi pfam or SCOP 
motife- 

35 The new protein can find application in studying the expression 
profile of brain-specific genes- 
similarity to putative mi togen-acti vated protein kinase kinase 
kinase 

40 (Homo sapiens) 

Sequenced by MediGenomix 
Locus: unknown 



Insert lengths 215fi bp 

Poly A stretch at pos. 213A-, polyadenylation signal at pos- 2117 



50 1 ATCCGGGGCC CCGGAACCCG AGCTGGAGCT GAAGCGCAGG CTGCGGGGCG 

51 CGGAGTCGGG AGTGCAGGCC TGAGTGTTCC TTCCAGCATG TCGGAGGGGG 

1D1 AGTCCCAGAC AGTACTTAGC AGTGGCTCAG ACCCAAAGGT AGAATCCTCA 

151 TCTTCAGCCC CTGGCCTGAC ATCAGTGTCA CCTCCTGTGA CCTCCACAAC 

201 CTCAGCTGCT TCCCCAGAGG AAGAAGAAGA AAGTGAAGAT GAGTCTGAGA 

55 251 TTTTGGAAGA GTCGCCCTGT GGGCGCTGGC AGAAGAGGCG AGAAGAGGTG 

301 AATCAACGGA ATGTACCAGG TATTGACAGT GCATACCTGG CCATGGATAC 

351 AGAGGAAGGT GTAGAGGTTG TGTGGAATGA GGTACAGTTC TCTGAACGCA 

MD1 AGAACTACAA GCTGCAGGAG GAAAAGGTTC GTGCTGTGTT TGATAATCTG 
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MSI 
SOI 
SSI 
fcOl 
5 bSl 
701 
7S1 
fiDl 
fiSl 

10 101 
=151 
1001 
1051 
1101 

15 1151 
1ED1 
1S51 
1301 
1351 

20 mm 

1M51 
1SD1 
1551 
IbOl 

25 IbSl 
17D1 
17S1 
IflDl 
IfiSl 

30 MD1 
1T51 
20D1 
2051 
3101 

35 B151 



ATTCAATTGG 
CATTAAAGAG 
CTGGGAGTCT 
ATGAATGAAA 
AAGCTACCTG 
GTGACACCAT 
GCTCCTGACA 
GAATCTACAC 
CAGCAGTGGA 
CTGGAGATTC 
CAGCAGTGCC 
AAAAGTGCCT 
CTGTTCCACC 
CCACTGCATT 
AGATCACCAA 
GGACCAGGAA 
GGAATTAGAT 
CAGCCTTTGG 
TCACCTGTCG 
GGTGGAGACT 
AGGAGGGAGT 
CTGAACCGGC 
GTTGGCGGCT 
GCCGGTTGAC 
AGGAACAGTA 
ACTCGGGCCA 
GCCCTCCTGT 
CTCCTTTATT 
TCCTTTCCCC 
TGCACAGACG 
CTAGTCGCTG 
ACGGGCACTA 
GGGAGAGAAA 
GCCTCAGTTG 
AAAAA AAA 



AGCATCTTAA 
AACAAGGCCA 
GAAGCAATTT 
AGGCATGGAA 
CACTCCTGTG 
CTTCATCCAG 
CTATCAACAA 
TTCTTTGCAC 
CATCTACTCC 
AGGGCAATGG 
ATCCAGCTTC 
GCAGTCTGAG 
CAGCATTGTT 
GTGGGACACC 
AAACATGGAT 
GAGAACCAGT 
AAATTCCTTG 
GCTGCCTCGG 
TGCCCCCCTC 
CGCAAGGTGG 
CAAACACCAC 
ACCTGAGCTG 
GAGCTGGTGC 
TTCTCTGCTA 
CCCTCAACTC 
GGCCCTGATC 
CCCTTCCCCC 
ATTCAGGAGG 
TCCCCTCTCT 
TGGGCCTGGG 
ATCTGCCGGC 
GGGG AGCCGA 
GGTGGTGCTG 
CTGCTGTAAT 



CATTGTTAAG 
GGGTCATTTT 
CTGAAGAAGA 
GCGTTGGTGC 
ACCCCCCCAT 
CACAACGGAC 
TCATGTGAAG 
CAGAGTATGG 
TTTGGCATGT 
AGA6TCCTCA 
TAGAAGACCC 
CCT6CTCGCA 
TGAAGTGCCC 
AACACATGAT 
ACTAGTGCCG 
TCAGACTTTG 
AAGATGTCAG 
CCCCAGCAGC 
TGTCAAGACT 
TGCTGATGCA 
CTGACACTTC 
TGACCTGATG 
AGCTGGGCTT 
GAAGAGACCT 
AGCCGCTGTC 
TGCGCTGTGG 
CAGTCAGTAT 
GC1GGGGGGG 
TCCTCCCCTC 
CCTTCTCAGC 
TCCCGCCCAG 
ATTCTACAAT 
CAGTGGTGGC 
AAAAGTCTAC 



TTTC ACAAAT 
TATCACAGA A 
CCAAAAAGA A 
ACACAAATCC 
CATCCATGGG 
TCATCAAGAT 
ACTTGTCGAG 
AGAAGTCACT 
GTGCACTGGA 
TATGTGCCAC 
ATTACAGAGG 
GACCAACAGC 
TCGCTCAAAC 
CCCAGAGAAC 
TACTGGCTGA 
TACTCTCAGT 
GAATGGGATC 
CACAGCAGGA 
CCGACACCTG 
GTGCAACATT 
TGCTGAAGTT 
CCAAATGAGA 
CATTA6TGAG 
TGAACAAGTT 
ACCGTCTCCT 
CTGTCCCTGG 
TACCCTGTGA 
CTCCCTGGTT 
TGCACTTTGT 
AGCC6CCTTC 
CCTGTGTGGA 
CCCGCTGGGG 
CCTGGGGGGC 
TTTTTGCCAA 



ATTGGGCTGA 
TACATGTCAT 
CCACAAGACG 
TCTCTGCCCT 
AACCTGACCT 
TGGCTCTGTG 
AAGAGCAGAA 
AATGTGACAA 
GATGGCAGTG 
AGGA AGCCAT 
GAGTTCATTC 
CAGAGAACTC 
TCCTTGCGGC 
GCTCTAGAGG 
AATCCCTGCA 
CACCAGCTCT 
TATCCTCTGA 
GGAGGTGACA 
AACCAGCTGA 
GAGTCGGTGG 
GGAGGACAAA 
ATATCCCCGA 
GCTGACCAGA 
CAATTTTGCC 
CTTAGAGCTC 
ACGTGCTGCA 
AGCCCCTTCC 
CTGAGCATCA 
TTACTTGTTT 
TAGTTGGGGG 
AAGGAGGCCC 
CGGCCGGGGC 
CATTCGATTC 
AAAAAAAAAA 



BLAST Results 



40 

No BLAST result 



Medline entries 



45 



No Medline entry 



50 

Peptide information for frame 1 



ORF from fifl bp to lb^2 bp=i peptide length: 535 
55 Category: similarity to unknown protein 
Classification: Protein management 

1 MSEGESflTVL SSGSDPKVES SSSAPGLTSV SPPVTSTTSA ASPEEEEESE 
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SI DESEILEESP CGRWflKRREE VNG3RNVPGID SAYLAMDTEE GVEVVWNEVfl 

1D1 FSERKNYKLfl EEKVRAVFDN LK2LEHLNIV KFHKYWADIK ENKARVIFIT 

151 EYflSSGSLKfl FLKKTKKNHK TMNEKAUKRU CTfllLSALSY LHSCDPPIIH 

SOI GNLTCDTIFI GJHNGLIKIGS VAPDTINNHV KTCREEflKNL HFFAPEYGEV 

5 551 TNVTTA VDIY SFGHCALEflA VLEIfiGNGES SYVPflEAISS AI<3LLEDPL(J 

301 REFIflKCLiJS EPARRPTARE LLFHPALFEV PSLKLLAAHC IVGHflHMIPE 

351 NALEEITKNM DTSAVLAEIP AGPGREPVCT LYSflSPALEL DKFLEDVRNG 

MQ1 IYPLTAFGLP RPflflPflflEEV TSPVVPPSVK TPTPEPAEVE TRKVVLflflCN 

M51 IESVEEGVKH HLTLLLKLED KLNRHLSCDL HPNENIPELA AELVflLGFIS 

10 SD1 EADflSRLTSL LEETLNKFNF ARNSTLNSAA VTVSS 



BLASTP hits 

15 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf brB_7fldlfl i frame 1 

20 TREMBL: ACDOTMtS_m gene: "TTJ1M -m"n product: "putative mitogen 
activated protein kinase kinase"; Arabidopsis thaliana 
chromosome III 

BAC T1J14 genomic sequence-! complete sequence- 1 N = l-i Score = 
375 i P = 
25 l-le-33 

TREF1BL : AFm5bTD_l gene: "BcDNA . LD5flbS7"n product: 
"BcDNA - LD5Bb57"T 

Drosophila melanogaster clone LD2flb57 BcDNA . LDEBL.57 
30 (BcDNA • LD2flb57) 

mRNAi complete cds.i N = 1 ■> Score = 1140-i P = 1.3e-115 

PIR:TD5T51 probable mitogen activated protein kinase - rice-i N s 

35 Score = 311, P = LMe-35 

>TREPIBL:AFmSb , lD_l gene: "BcDNA - LDSflbS?"* product: 
"BcDNA • LD2flb57"* 
40 Drosophila melanogaster clone LD5fib57 BcDNA • LD5Ab57 

( BcDNA >LD5fib 57) mRNA, 
complete cds- 

Length = b37 

45 HSPs: 



Score = HMO (171. □ bits)-. Expect = l-3e-115-> P = l-3e-115 
Identities = S30/i4b5 ( -» Positives - 304/MbS (bSX) 

50 fluery: bl 

CGRUfiKRREEVNflRNVPGIDSAYLAMDTEEGVEVVUNEVflFSERKNYKLflEEKVRAVFDN 150 
CGRliI KRREEV+flR+VPGID +LAMDTEEGVE VVblNEV(2++ + K 

OEEK+R VFDN 
Sbjct: 105 

55 CGRULKRREEVDt2RDVPGIDCVHLAMDTEEGVE VVUNEV(2YASLt!ELKS(3EEKI1R(3VFDN lbl 

fluery: 151 LlflLEHLNIVKFHKYUADIKE- 
NKARVIFITEYMSSGSLKflFLKKTKKNHKTflNEICAWKR 17T 
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L+flL+H NIVKFH+Yld D ++ + RV+FITEYI1SSGSLKQFLK+TK + N K 

+ ++U+R 
Sbjct: IbS 

LL<3LDH(2NIVKFHRYWTI>T(}<3AERPRVVFITEYnSSGSLK(3FLKRTKRNAKRLPLESliJRR 231 

5 

fluery: 180 

h)CTd3ILSALSYLHSCDPPIIHGNLTCDTIFI(2HNGLIKIGSVAP]>TINNHVKTCREE(2KN 531 
UCTOILSALSYLHSC PPIIHGNLTCD+IFK2HNGL+KIGSV PD ++ V+ 

RE ++ 
10 Sbjct: ESS 

UCTflILSALSYLHSCSPPIIHGNLTC]>SIFI(3HNGLVICIGSVVP]>AVHYSVRRGRERERE Sfil 

Query: SMO LHFF-APEYGEVTNVTTAVDIYSFGriCALEriAVLEIfS- 

GNGESSYVP<2EAISSAI<S 513 
15 H+F APEYG +T A+DIY+FGMCALEMA LEId N ES+ + 

+E I I 
Sbjct: SSS 

RERGAHYFflAPEYGAADCJLTAALDIYAFGflCALEIIAALEIflPSNSESTAINEETIflRTIF 341 

20 fluery: S=JM LLEJ>PL«2REFI<3KCL<JSEPARRPTARELLFHPALFEVPSLKLLAAHCIV 

GHflHMIPE 350 

LE+ LflR+ I+KCL +P RP+A +LLFHP LFEV SLKLL AHC+V 

++ M E • 
Sbjct: 345 

25 SLENDLflRBLIRKCLNPflPflDRPSANDLLFHPLLFEVHSLKLLTAHCLVFSPANRTMFSE 401 
fiuery: 351 NALEEITKNH- 

DTSAVLAEIPAGPGREPV<3TLYS(3SPALELI>KFLE1>VRNGIYPLTAFGL 401 

A + + + V+A++ G+E L S A +L+KF+EDV+ 

30 G+YPL + 

Sbjct: 105 

TAFDGLMflRYYflPDVVMAflLRLAGGOERdYRLADVSGADKLEKFVEDVKYGVYPLITYS- 4bO 
Query- MID 

35 PRXXXXXXXXXXXXXXXXXXXXXXXXXAEVETRKVVLIIiaCNIESVEEGVXXXXXXXXXXX 4b1 

+ + E+R++V 11 C+++ E+ 

Sbjct: 4bl 

GKKPPNFRSRAASPERADSVKSATPEPVDTESRRIVNIinCSVKIKEDSNDITnTILLRriD 550 

40 Query: 470 XXXXXXXSCDLMPNENIPELAAELVflLGFISEADUSRLTSLLEETL 515 

+C + N+ +L +ELV+LGF+ D(J ++ LLEETL 
Sbjct: 551 PKMNRfiLTCflVNENDTAADLTSELVRLGFVHLDDflDKIflVLLEETL Sbb 

45 Pedant information for DKFZphf brS_78diai frame 1 

Report for DKFZphf br5_7SdlB • 1 



50 



CLENGTH3 5b4 
cmo. b34b4.B7 



EpIJ 5.10 

IH0I10L3 TREMBL : AFm5b10_l gene: "BcDNA -LDSab5?"* product: 

55 n BcDNA.LI>Sflb57"i Drosophila melanogaster clone LDSBbS? 

BcDNA - LDEflb57 (BcDNA-LDEflb57) mRNAi complete cds. le-153 
CFUNCAT3 03-55 cell cycle control and mitosis CS. cerevisiaei 
YJLDISwI be-15 
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EFUNCAT3 3D- 03 organization of cytoplasm ES. cerevisiae-. 
YJL0T5w3 be-15 

EFUNCATID 11-01 stress response ES. cerevisiae-. Y JLDTSwID fc.e-15 
EFUNCAT3 03-01 cell growth ES- cerevisiae-. YJL0^5wJ be-15 
5 EFUNCAT2 10-02-11 key kinases ES- cerevisiae-. YJL0T5w3 L.e-15 

EFUNCAT3 D3.DI* budding-, cell polarity and filament formation 

ES. cerevisiae-, YJLD15w3 be-15 
EFUNCATU classification not yet clear-cut ES. cerevisiae-. 

YLRO^bwID 2e-0^ 

10 EFUNCATID 30-02 organization of plasma membrane ES. cerevisiae-. 
YLRDTbwJ Se-DT 

EFUNCATID 10.03-11 key kinases ES. cerevisiae-. YNR031c3 Se-Q^ 

EFUNCAT3 0T-01 biogenesis of cell wall ES. cerevisiae-. 

YNR031cJ 3e-m 

15 EFUNCATID 03-0? pheromone response-, mating-type determination! 
sex-specific proteins ES- cerevisiae-. YLR3t.2w3 Me-Ofl 
EFUNCATID 10-05.11 key kinases ES- cerevisiae-, YLR3bEw3 Me-Ofi 

EFUNCAT3 10-0H.11 key kinases ES- cerevisiae-. YLR3bEw3 Me-06 

EFUNCATID ll-0 l 4 dna repair (direct repair-, base excision repair 

20 and nucleotide excision repair) ES- cerevisiae-. YPL153c3 le-07 
EFUNCATID 03-1T recombination and dna repair ES. cerevisiae-. 

YPL153cID le-07 

EFUNCATID 03-22-01 cell cycle check point proteins ES- 
cerevisiae-. YPL153c3 le-07 
25 EFUNCATID 30-10 nuclear organization ES - cerevisiae-. YPL153c3 
le-07 

EFUNCATID 03-25 cytokinesis ES- cerevisiae-. YDR507c3 le-07 
EFUNCAT3 10. =n other signal-transduct ion activities ES- 
cerevisiae-. YPL153c3 le-07 
30 EFUNCAT3 03-13 meiosis ES- cerevisiae-. YDRS23c3 3e-07 

EFUNCAT3 03-10 sporulation and germination ES- cerevisiae-. 
YDR523c3 3e-07 

EFUNCAT3 03-lb dna synthesis and replication ES. cerevisiae-. 

YMROOlcJ 2e-0b 

35 EFUNCAT3 11 unclassified proteins ES. cerevisiae-. YDRM^Ocl 

3e-05 

EFUNCATJ 05-07 translat i onal control ES- cerevisiae-^ YDR2fi3c3 
le-OM 

EFUNCATJ 01.05.0i* regulation of carbohydrate utilization ES- 
40 cerevisiae-. YDRM77w3 le-OM 
EBL0CKS3 PF00b37A 
EBL0CKS3 BPD3niJ 
EBL0CKS3 PF01317B 

ESC0P3 dlir3a_ 5-1-1-2-b insulin receptor Complex 

45 (transf erase/substrate) 2e-53 

ESC0P3 dlphk 5-1-1-1-b gamma-subuni t of glycogen 

phosphorylase kinas 3e-bfl 

ESC0P3 dlfgkb_ 5-1-1-2.5 Fibroblast growth factor 

receptor 1 Ehuman (Horn le-55 

50 ESC0P3 dlabo S-l-l-l-lM Protein kiase CK2-. alpha 

subunit EMaize (Ze 2e-55 

ESC0P3 d31ck_ 5-1.1.2.2 Lymphocyte kinase (lck) EHuman 

(Homo sapiens) 7e-5M 

ESC0P3 dEerk 5-1-1-1-11 I1AP kinase Erk2 Erat (Rattus 

55 norvegicus) Te-?! 

ESC0P3 dlcdkb_ 5.1-1-1-5 cAMP-dependent PK-. catalytic 

subunit Comple le-55 
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25 



WO 01/98454 

ESC0P3I 
(Homo sapiens) 



EECJ 

CEO 

EPIRKIO 

EPIRKUO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

EPIRKIO 

CSUPFArU 

EsuPFAro 

13 

ESUPFAI1J 



e-7.1 

2.7.1 



prote 
unass 



PCT/IB01/02050 

dlhcl 5.1-1-1.1 Cyclin-dependent PK EHuman 

Me-b7 

-HE Protein-tyrosine kinase Me-Ob 
-37 Protein kinase Se-CH 
phosphotransferase Ee-Efi 
nucleus 3e-0b 
RNA binding 3e-lD 
tandem repeat Me-07 
cell cycle control 3e-Db 

serine/threonine-specif ic protein kinase Ee-13 

transmembrane protein Me-D7 

autophosphorylation 3e-10 

tyrosine-specif ic protein kinase t *e-Db 

magnesium Me-D? 

ATP Ee-13 

receptor 4e-D7 

phosphoprotein Ee-13 

apoptosis 3e-0b 

glycoprotein 4e-07 

protein kinase Ee-Efi 

signal transduction Ee-Dfi 

cell division le-11 

calmodulin binding 3e-0b 

in kinase byrE le-Ob 

igned Ser/Thr or Tyr-specific protein kinases 2e- 



leucine-rich alpha-E-glycoprotein repeat homology Me-07 



ESUPFAN31 double-stranded RNA-binding repeat homology 3e-10 

30 ESUPFArD SAI1 homology le-Ob 

(TSUPFAM3 death-associated protein kinase 3e-0b 

ESUPFANJ ankyrin repeat homology 3e-0b 

ESUPFAMJ protein kinase homology Ee-Efi 

ESUPFANJ kinase-related transforming protein Ee-Ob 

35 ESUPFAPO protein kinase SPKl 3e-0b 

ESUPFAHJ protein kinase XaEl 4e-07 

ESUPFAMJ protein kinase TIK 3e-lD 

ESUPFAMJ kinase interaction domain homology 3e-0b 
EPFAMJ Eukaryotic protein kinase domain 

40 EKIO All_Alpha 

EL K UID 3D 

EKIO L0W_C0I1PLEXITY Ib-HT X 



45 SE<2 IRGPGTRAGAEA(2AAGRGVGSAGLSVPSSMSEGES(3TVLSSGS1>PKVESSSSAPGLTSVS 

SEG - • • -xxxxxxxxxxxxxxxx xxxxx 

IkobA 

50 SE<3 PPVTSTTSAASPEEEEESEDESEILEESPCGRU(3KRREEVN(3RNVPGIl>SAYLAnDTEEG 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

IkobA 



55 SE<3 VEVVUNEV(3FSERKNYKL(3EEJCVRAVF]>NLIt3LEHLNIVKFHICYUAl>I)CENJCARVIFITE 

SEG ... 

IkobA CHHHHHHHHHHHHHHHTTTBTTBCCEE 

EEEETTTEEEEEEC 
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SE(3 YnSSGSLKfiFLKKTKKNHKTNNEKAUKRUCTfllLSALSYLHSCDPPIIHGNLTCDTIFIfl 

SEG r 

IkobA CCCCEEH — HHHHCTTTTC-CCHHHHHHHHHHHHHHHHHHH — 

5 HHCEETTTTTTTTEETT 

SE<2 HNGLIKIGSVAPDTINNHVKTCREEdJKNLHFFAPEYGEVTNVTTAVDIYSFGIICALEMAV 

SEG 

IkobA 

10 TTCCEEECCTTTTEECTTTTEEEEETTTGGGCCHHHHHCCCBCHHHHHHHHHHHHHHHHC 

SE(2 LEI<26NGESSYVP(JEAISSAI<2LLE]>PL<3REFI(3KCL<2SEPARRPTARELLFHPALFEVP 

SEG 

IkobA 

15 CCTTTTCCCHHHHHHHHHHCCCCTTTHHHHHHHHHTTTTTGGGCCCHHHHHHTTTT 

SEfl SLKLLAAHCIVGHdHMIPENALEEITKNMDTSAVLAEIPAGPGREPVflTLYSCiSPALELD 

SEG 

IkobA 

20 



SE<2 KFLEDVRNGIYPLTAFGLPRP<3<3P<3<3EEVTSPVVPPSVKTPTPEPAEvETRKVVI_t1l2CNI 

SEG • xxxxxxxxxxxxxxxxxxxxxxxxx 

IkobA 

25 

SE(2 ESVEE6VKHHLTLLLKLEDKLNRHLSCDLI1PNENIPELAAELV<2LGFISEAD<2SRLTSLL 
SEG xxxxxxxxxxxxxxxxxx 



IkobA 

30 

SE<2 EETLNKFNFARNSTLNSAAVTVSS 

SEG 

IkobA 

35 

<No Prosite data available for DKFZphf brS_7fldlfi • 1) 



40 Pfam for DKFZphf br5_?Adlfi • 1 



HI1M_NAI1E Eukaryotic protein kinase domain 
45 WW 

*rLnHPNIIRFYDwFed • • . ddDHIYNHIEYrieGGDLFDYIrrng p 

+L H NI++F ++ D + ++ +I+EYM G+L +++++ + 

fiuery 15S 

QLEHLNIVKFHKYUADIKENKARVIFITEYMSSGSLKOFLKKTKKNHKT EDO 

50 

HMPI 

llsEwelrf IMyfllLrGHeYLHSMg • • IIHRDLKPENILIDeNgqIKIcDF 

n+ E> +++ +AIL++++YLHS IIH L + I+I +NG 

IKI+ 

55 Query 501 

UNEKAUKRUCTfllLSALSYLHSCDPPIIHGNLTCDTIFIflHNGLIKIGSV 250 
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hmm 

GLARqMnnYerMttf CGTPbJYMMAPE VIImgnyYttkVDMUSFGCILUEM 

++ N+ + + + APE + ++ TT+VD++SFG+ 

EM 

5 <2uery 251 APDTINNHVKTCREEdKNLHFF-rAPEY- 

GEVTNVTTAVDIYSFGMCALEM 2^6 

HUM 

IITGepPFyddnMemlmrliqrf rrpf UpnCSeElyDFMrwCWnyDPekRP 
10 ++++ N E +++++++ + ++F+ +C++ 

P++RP 

Query 2^ A — VLEI<2- 

GNGESS YVP(2EAISSAI<2LLEI>PLt3REFI(2KCL(2SEPARRP 3HS 

15 HUM TFr(3ILnHPUF* 

T+R++L HP + 
Query 3Mb TARELLFHPAL 35b 
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DKFZphf br2_7fldM 



PCT/IB01/02050 



group: transmembrane protein 

5 

DKFZphf br2_7SdM encodes a novel Ififi amino acid protein without 
similarity to known proteins. 

The novel protein contains 1 transmembrane region and a 
10 Cytochrome c family heme-binding site- 
No informative BLAST results^ No predictive prosite-. pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
15 profile of brain-specific genes and as a new marker for amygdala 
cells- 



weak similarity to hypothetical protein of Arabidopsis thaliana 

20 

perhaps complete cds- 
Pedant: TRANSMEMBRANE 1 

Sequenced by MediGenomix 

25 

Locus: unknown 



30 



Insert length: 1S47 bp 

Poly A stretch at pos- 1557-1 polyadenylation signal at pos- 



isoa 



1 TTGCCGCCGC 
51 GGCGGCCGAG 
101 TGGATGTGGA 

35 151 CCGGTGCTCA 

501 ATCCACAGGA 

251 ACAACATGGC 

3D1 CAGGTCTATG 

351 CTCTGAGGAG 

40 M01 ACTCGCACGT 

MSI AACTGGAATA 

501 CGTCAGCGTT 

551 TGGGCATCAT 

bul CTCGGTGGCC 

45 bSl TTTGGTTCCA 

701 CTGTTGTGGA 

751 GCATGAAGTG 

SOI CCTCCTCCCC 

flSl GTTGAAAGGC 

50 ^01 TGGGGTCCTG 

'ISl 6GCTCACCTG 

1D01 TGCACCGGGT 

1D51 TTCACCACTG 

11D1 TCAGACTCCT 

55 1151 TCCTCCAGCC 

12D1 CTCCCAGCAT 

1251 GTGGACCTGA 

13D1 GCGCTCAGCC 



CGCCACCCCC 
ACGGACATGA 
ACGGAGTCGC 
CGTGGTTTTT 
GTCATTCGGG 
CTTTGGAAAG 
CTAGCGGGCC 
TACAAGCACC 
GGCATTGGCC 
TGGTGACGCT 
GGG6CCTTCG 
CCTCACCGTC 
CCACACCCAC 
GATTTTTTTC 
CCGGGGGTCG 
GGGGTTTGTT 
AGGCCTGTGA 
TTTGGCTTCC 
CCTCAGCCCA 
TCAAGGTGGC 
ACACTTAACG 
GCCTCTAGAA 
GCCGGCAGCT 
CGTGCAGTCC 
TGGACCCATC 
CAAGTGCCAG 
CCAGCCCCAG 



GCCCAGGATG 
AGCAATATCA 
TTCCCCTACT 
CCCCATCATC 
ACTTC6CGGG 
CCTGCCAAGT 
CAACGCATGG 
GCATGCACAA 
CTGAATCTGA 
CTGCTTCTTC 
TGAAGACCTG 
AGCCTGGTCT 
CAGGGTCCCG 
TCCTCACCCC 
GGGCTGGCAG 
GTCTCCCTGC 
CTCCGGCCCT 
CTCTGTAGAG 
GTGCCCAGTA 
CCTGGGACCA 
TGTCTCTATA 
TGGTCCAGAG 
GCCCTGGGGG 
GCTCTTCACT 
TCCCCCTGCA 
AGTAACCGTG 
GTGTGGACCT 



GCGGAAGTGG 
AGGCTCCGGC 
GCGTGGTGTG 
GGCCACATGG 
CCCCTACTTT 
ACTGGAAGTT 
GACACGGCTG 
TCTCTGCTGT 
TGCGCTACAA 
TGCCTGCTCT 
GCTGCCCTTC 
TTAACCTCCG 
AGGAAACAGC 
AAAAGGCAGG 
GATGGAAGGA 
CTCTCAGAAG 
GGAAGCCCCT 
CTGCTCCCGC 
TGGGGAGAGG 
GAGCTGGTCC 
AGCCAAGTTG 
GGGCTGGCTG 
ACATGTGTGC 
GTTCCACGGC 
GTTTGAGGCC 
TAGACAGAGC 
CATGCTGGTG 



AGGCGCCGAC 
GGC6TCGCCA 
GACGCCCATC 
GCATCTGCAC 
GTCTCAGAGG 
GGACCCTGCT 
TGCACGACGC 
GACAACTGCC 
CAACAGCACC 
ACGGGAAGTA 
ATCCTTCTCC 
GTGATGGCTG 
CGCCATCCCT 
GTT6GGCCTG 
CTGAGGACCA 
CACCCTGTCC 
TTGTTCTTCT 
CACCACCTGC 
AGGACATTT6 
CAGCATGGGG 
CTTCAGGACC 
GGTCCCTTTG 
CCATCTGGCA 
CTCCCAGTGC 
AGAGAGGTGA 
AGTGTAGACA 
ATGGCTCCCC 
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1351 TGGGTGGCCT GCCAGCACAG CCAGTGCCAT CAGGGAGCTG AAGGGGCTGT 

mDl CCCCCACCTA ACTCCAGCTC CCCCTTCACG TTGTCACCAA GGCCCTGTGC 

1M51 CGCCCGCCTC GCCCCCCTGC TCTGTGGATT CCTTTGGGAA GGGCTCCCTG 

1501 GGCAGGACAA TAAAGAGTTT TGACTCCAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



10 Entry TDEblb from database PIR:- 

hypothetical protein TlTLlfi-12 - Arabidopsis thaliana 
Score = EE^-i P = 1.3e-l?n identities = 57/lbl-. positives 
7fl/lbl-, 
frame +1 



15 



20 



50 



Medline entries 

No Medline entry 



25 Peptide information for frame 1 

ORF from Efl bp to 5^1 bp=i peptide length: lfia 
Category: similarity to unknown protein 
30 Classification: no clue 

Prosite motifs: CYTOCHROMES (lEl-HT) 

1 MAEVEAPTAA ETDMKflYCJGS GGVAMDVERS RFPYCVVUTP IPVLTWFFPI 
35 SI IGHMGICTST GVIRDFAGPY FVSEDNMAFG KPAKYldKLDP AflVYASGPNA 

101 UDTAVHDASE EYKHRMHNLC CDNCHSHVAL ALNLMRYNNS TNUNMVTLCF 
151 FCLLYGKYVS VGAFVKTULP FILLLGIILT VSLVFNLR 

40 

BLASTP hits 

No BLASTP hits available 

45 Alert BLASTP hits for DKFZphf brE_7fidH i frame 1 

PIR:T02blb hypothetical protein TnLlfl.lE - Arabidopsis thaliana-. 
N = 

E-. Score = ESb-, P = M.5e-21 



>PIR:TQ2blb hypothetical protein Tl^Llfl-lE - Arabidopsis thaliana 
Length = 2fc>7 



55 HSPsj 



Score = ESb bits)-. Expect = M.Se-21-. Sum P(E) = H.5e-21 

Identities = 5E/13E (3^*)-, Positives = 71/13E C53*> 
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Query- 25 

MDVERSRFPYCVVUTPIPVLTWFFPIIGHMGICTSTGVIRDFAGPYFVSEDNMAFGKPAK 

+ D ++S+FP C+VWTP + PV++U P IGH+G+C GVI DFAG F++ D + 

5 AFG PA+ 

Sbjct: bl 

IDTKKSKFPCCIVUTPLPVVSWLAPFIGHIGLCREDGVILDFAGSNFINVDDFAFGPPAR 120 

fluery: AS YWKLDPAflVYASGPNAUDTAVHDASEEYKHRMHNLC — 
10 CDNCHSHVALALNLMRYNNST- 1M1 

Y +LD + PN H +KH DN S + 

YN T 

Sbjct: 1E1 YLGLDRTKCCLP-PNMGG 

HTCKYGFKHTDFGTARTWDNALSSSTRSFEHKTYNIFTC 17b 

15 

fluery: 1MB NUN-MVTLCFFCLLYG 15b 

N + V C L YG 
Sbjct: 177 NCHSFVANCLNRLCYG 115 

20 Score = 1S7 (23-b bits)-, Expect = l.fle-13-, Sum PC2) = l-6e-13 
Identities = 27/61 (335:)-. Positives = SO/fll (blZ) 

(3uery: 101 

udtavhdaseeykhrmhnlccdnchshvalalnlmrynnstnunmvtlcffcllygicyvs IbO 

25 UD A+ ++ ++H+ +N+ NCHS VA LN + Y S UNMV + 

++ GK+++ 
Sbjct: 155 

WDNALSSSTRSFEHKTYNIFTCNCHSFVANCLNRLCYGGSMEUNMVNVAILLMIKGKUIN 21M 

30 Query'- Ibl VGAFVKTULPFILL — LGIIL 171 

+ V+++LP ++ LG++L 
Sbjct: 215 GSSVVRSFLPCAVVTSLGVVL 235 

Score = 3b (5-H bits)-, Expect = H.Se-21-, Sum P(2) = 4.Se-21 
35 Identities = 7/21 (33X)-, Positives = 1H/21 (bb5c) 

tiuery: ID AETDMK<3Y(2GSGGV AMD VERS 30 
++ ++K +G G MD++RS 

Sbjct: 12 sdrnlkmsrgrgvpmmdlkrs 32 

40 

Pedant information for DKFZphf br2_7fld l » ■> frame 1 



45 



Report for DKFZphf br2_7fld l 4 . 1 



50 



55 



ELENGThO lfifl 

EMtO 2117fl.bb 

EpI3 b-27 

EHOMOLJ PIR:TD2blb hypothetical protein TllLlfl.12 - 

Arabidopsis thaliana 7e-32 

EPR0SITE3 CYTOCHROMES 1 

EKliO TRANSMEMBRANE 1 



SE(3 MAEVEAPTAAETDMK<3Yl3GSGGVAMDVERSRFPYCVVUTPIPVLTWFFPIIGHMGICTST 
PRD cccccchhhhhhhhhhccccccccccccccccccceeeccceeeeeeeeecccceeecce 
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SEfl GVIRDFAGPYFVSEDNMAFGKPAKYUKLDPAflVYASGPNAUDTAVHDASEEYKHRflHNLC 

PRJ> eeeeccccccccccccccccccceeeeccccceeeccccccccccccccchhhhhhhhee 

5 MEM . 

SE<2 CDNCHSHVALALNLIIRYNNSTNUNMVTLCFFCLLYGICYVSVGAFV/KTIiILPFILLLGIILT 

PRD ecccchhhhhhhhhhhccccccchhhhhhhhhhhccceeeeeeeeeeeccceeeeceeec 

MEM MMMMMMMMnririri 



10 



SEC 
PRD 

riEn 



VSLVFNLR 



ceeeeccc 



winnn . - • 



15 



Prosite for DKFZphf brS_7ficm . 1 



PS0011D 



151->1B7 



CYTOCHROMES 



PDOCOOlbl 



20 



(No Pfam data available for DKFZphf br5_7fldM -1) 
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5 group: brain derived 

DKFZphf br2_7fielfi encodes a novel 307 amino acid protein without 
similarity to known proteins- 

10 The mRNA is differentially polyadenylated- 

No informative BLAST results; No predictive prosite-i pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
15 profile of brain-specific genes- 

similarity to hypothetical protein of Arabidopsis thaliana 

20 differential polyadenylation 
> 7 exons 

complete on human genomic clone M51B21ap. 
perhaps complete cds. 

25 Sequenced by MediGenomix 

Locus: /map="mM-5D cR from top of Chrb linkage group" 

Insert length: 3CHfcj bp 
30 Poly A stretch at pos- 3075-* polyadenylation signal at pos- 3UH? 

1 TGGTGAGTTC GGAGTAGAGA TGGCCGCGCT TGCACCGCTG CCCCCGCTCC 

51 CCGCACAGCT CAAGAGCATA CAGCATCATC TGAGGACGGC TCAGGAGCAT 

35 1D1 GACAAGCGAG ACCCTGTGGT GGCTTATTAC TGTCGTTTAT ACGCAATGCA 

151 GACTGGAATG AAGATCGATA GTA AAACTCC TGAATGTCGC AAATTTTTAT 

201 CAAAGTTAAT GGATCAGTTA GAAGCTCTAA AGAAGCAGTT GGGTGATAAT 

251 GAAGCTATTA CTCA AGAAAT AGTGGGCTGT GCCCATTTGG AGAATTATGC 

301 TTTGAAAATG TTTTTGTATG CAGACAATGA AGATCGTGCT GGACGATTTC 

40 351 ACAAAAACAT GATCA AGTCC TTCTATACTG CAAGTCTTTT GATAGATGTC 

mil ATAACAGTAT TTGGAGAACT CACTGATGAA AATGTGAAAC ACAGGAAGTA 

MSI TGCCAGATGG AAGGCAACAT ACATCCATAA TTGTTTAAAG AATGGGGAGA 

5D1 CTCCTCAAGC AGGCCCTGTT GGAATTGAAG AAGATAATGA TATTGAAGAA 

551 AATGAAGATG CTGGAGCAGC CTCTCTGCCC ACTCAGCCAA CTCAGCCATC 

45 bDl ATCATCTTCA ACTTATGACC CAAGCAACAT GCCATCAGGC AACTATACTG 

fc,Sl GAATACAGAT TCCTCCGGGT GCACACGCTC CAGCTAATAC ACCAGCAGAA 

7D1 GTGCCTCACA GCACAGGTGT AGCAAGTAAT ACTATCCAAC CTACTCCACA 

751 GACTATACCT GCCATTGATC CCGCACTTTT CAATACAATT TCCCAGGGGG 

flOl ATGTTCGTCT AACCCCAGAA GACTTTGCTA GAGCTCAGAA GTACTGCAAA 

50 B51 TATGCTGGCA GTGCTTTGCA GTATGAAGAT GTAAGCACTG CTGTCCAGAA 

^□1 TCTACAAAAG GCTCTCAAGT TACTGACGAC AGGCAGAGAA TGAAGCCTTT 

^51 GTATGACAGA CCCATGTATT TTTGGCATGA GGAACTAACA GTCCATTACT 

1DD1 CTATCTTCAG CCTATCAGGA TCACAGTTTT AAGGAAGACT TGGTTTTGTT 

1D51 GAATATGACA ATGAAATCTG TGTGTATCAG ATTTTTATTG AAGCATTCAT 

55 11D1 CAGCAGCCTC AACCAGTTTT CATTGTCCAT TTACTAGATT CAATCGTCTC 

1151 TGAGTATATA GGGCTGATGT TAGCAAGACC CTAAAAATGT CCATTGAACC 

1201 CTGCTTCAAA AAATGAAAAC ACACCTCTAT AAAATGTGTA CTGGGAATAA 

1251 GCTTTGTATT TACATACATT AGGGGA ATTT TTTAAAATCT GTAATGTTTG 
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1301 GACAAACAGA TGATATTACT TTGCTATAAA ATTATAAATG TAACTTTTAA 

1351 TAAAG ATAGC CAGAATATTC TAAATTAGAA ATTACGTTTT TGTTTCCCTC 

IMOl AAGACATAAA ACAA ATATA A ACATTCTAAA CTGCTGGATG AATCTGAAAA 

1M51 GACATTAAGT TCAAATTTTA ATTTATTCTC ATATTAAATA TAACTCCATT 

5 1501 AAAAGTTTAA AATTTCATGG GAGAAAATAT AATAAGGTAA AGAGGTAGAA 

1551 TCACTTTCAG ACTTAAGAAT AATGTTGATT TCCCAAGTGC TTTACCTTAT 

IbOl CTGTTAAAGC GTAAGATGAA TTGGTATTTG CTTCATAGGC AGTTTGACTG 

lfc,51 CATGTATTAG AGAATGAAAA GAAGATATTT GTAGTAATGC CTGGAAACTT 

1701 GGTGCTTTAA ATTAAGGTAC TCCTCTGCTG CTGTAGAATG GATTCCACAC 

10 1751 AGTGGATAGC TATGGGTGAT TCAGAATATT ATGTTTAGAT TCCCATTTGT 

IfiOl TAAGTTTATA AGTTTTGTGG GGAATTATGA ACTTACTGTG TACTACCTGC 

1S51 ATTTGTGCTG TGTGAAAAAT AAATACAAGG ATTCGTTTAG CTAATTCAAC 

MD1 TTACTACAAA GACAAATGTC TGTTTTTATT TGCCTGCTAG GATTGTCTTT 

MSI TTTAAAAGTC ATTTTTATTT ATAGGAATAT GGGTGTTTCT ATAGGAAGAA 

15 2001 ACAGGTTTTT TGTTTTTTGT TTTTTAAGAT AAATTTGACA AAGTTAACTG 

2051 AAATTTATCT GGTCCATTTT ATTCATGCTA CTAAGATGGG AATCTTTAAA 

2101 CACAAGGGTC AGCAAGCTTT GGCCCATGGA TTGGCCACCT GTTACGTAAA 

2151 TAAAGTTTCT TTGAAACAAG CCTACACTCA TTCATTTATG TTTTGTCTGT 

22D1 GGTTGCTTTC CACAACTGCA GAGTTGTATG GCTTGCAAGT CTAAAAACAT 

20 2251 TTACTATTTG GCCCTCTAAG AAAAAGTTAA GACACCTAGT CTAATGGCCT 

2301 TTTGGGAAAA AACAAATCAC TAACTCATAA TCATTTATAT CCATTATTTT 

2351 CTGCATAAAT GTAATGCTAT TGTACAGGGT TTGGTAGAAT AAATATTCAG 

2M01 ACTGACTAAA CTGTTCTAAA TTCTCACAAA AAAGTCCCCA AACAACATGC 

2M51 CTCCTAAAAA ACATTTTCCT ATCTTTTACA AGAGGTATGA ACATTTGTAG 

25 2501 GGTTCCACAT TTGCATCTAG AAATCCAATG CTCTTTAGAA TGTTATTACG 

2551 AATAGAAAGA TGGCCAGGAT GACCTTTAGT GTTACATGAT GTTCAGCAAA 

2L01 TTTTAATTCA AACCTTGATA TGCCTGGACA CTGAAAAGTA AACGCATCAC 

2b51 CTCCTATTTT ATACCCTACC TTCTGGTTCC CAATTGGGAG AGCACATAGA 

2701 GGGAAGGAGA CAATATAGAA ACT ACGGAGT CCGCTGGTAG TGGGCTGCAT 

30 2751 GGTGTGACAG AGCCCTTCTC TGTAAAATGG AAATGACACC ACTAGCCATC 

2fiQl TCAATAGTTA CAAGAATTAA AAGAGATACA GTACCTGAAG TGCTTAGCGC 

2fi51 ATGGTAGCAT TTCATAAATG TTT AGTGTCA ATACTAATGC TCTAATAATG 

2=101 TAAATTGTTA ATAATTTATT TCCCTAATAT CAGGAAATCC CAGTTGTCTA 

2^51 TGTGGCCCAG TGCTTAAAAA CGCCTTCTTG CATGAGGGGA TTGAACTATA 

35 3001 CAATGTTTGT TAACTTTGTA TTTGTATTTT TTCCTATAAA ATCTTAAAAT 

3051 AAAATTAGGA GATGTGTTCT GATGTAAAAA AAAA A A A A A A AAAAAA 



BLAST Results 

40 

Entry HSM51B21 from database EMBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 
H51B21 

45 Score - 112M-1 P = O-Oe+OD-i identities = 2267/23M3 



Medline entries 

50 

No Medline entry 



55 

Peptide information for frame 2 
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ORF from 20 bp to TMQ bp=i peptide length: 307 
Category: similarity to unknown protein 
Classification: no clue 

5 1 nA ALAPLPPL PAi2LKSIl2HH LRTAC2EHDKR DPVVAYYCRL YAMfiTGMKID 

51 SKTPECRKFL SKLMDtfLEAL KKI2LGDNEAI TGEIVGCAHL ENYALKI1FLY 

101 ADNEDRAGRF HKNHIKSFYT ASLLIDVITV FGELTDENVK HRKYARWKAT 

151 YIHNCLKNGE TPtfAGPVGIE EDNDIEENED AGAASLPTdP TGPSSSSTYD 

501 PSNI1PSGNYT GldlPPGAHA PANTPAEVPH STGVASNTIfl PTPfiTIPAID 

10 251 PALFNTISdG DVRLTPEDF A RAtfKYCKYAG SALflYEDVST AVdNLtfKALK 

301 LLTTGRE 



15 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7fleia-» frame E 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_7fielfln frame 2 



20 



25 



Report for DKFZphf br2_7fielfi • 2 



ELENGTH3 313 

30 EI1UJ 3MMb3.^5 

EpU 5-bM 

EHOIIOLID PIR:T0L47T8 hypothetical protein F10n23-^0 - 

Arabidopsis thaliana 3e-22 

EKUJ All_Alpha 

35 EKliD - L0U_C0MPLEXITY -It-bl X 

SEtf GEFGVEHAALAPLPPLPA(3LKSIc3HHLRTA(3EHDKRDPVVAYYCRLYAI1(3TG[1KIDSKTP 

SEG xxxxxxxxxxxxx 

40 PRD ccchhhhhheeecccccchhhhhhhhhhhhhhhhcccceeehhhhhhhhhhccccccccc 

SE(3 ECRKFLSKLNDi2LEALKK(3LGDNEAIT<3EIVGCAHLENYALKnFLYADNEDRAGRFHKNn 

SEG 

PRD chhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhccccccccccchh 

45 

SE<2 IKSFYTASLLIDVITVFGELTDENVKHRKYARldKATYIHNCLKNGETPflAGPVGIEEDND 

SEG - xxxxxxx 

PRD hhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhccccccccccccccccc 

50 SEG IEENEDAGAASLPTtfPTflPSSSSTYDPSNNPSGNYTGIfllPPGAHAPANTPAEVPHSTGV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SES ASNTI(3PTPc3TIPAIDPALFNTIS(2GDVRLTPEDFARAi3KYCKYAGSAL(3YEDVSTAVr3N 

55 SEG 

PRD cccccccccccccccccccccccccccccccchhhhhhhhhhhhhcceeeecchhhhhhh 

SE(3 LtfKALKLLTTGRE 
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SEG 

PRD hhhhhhhhccccc 

(No Prosite data available for DKFZphf brE_ 76elB • 5 ) 

(No Pfam data available for DKFZphf brE_7fielA - E) 
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5 group: metabolism 

DKFZphf br2_7fii21 encodes a novel M77 amino acid protein with 
similarity to beta-aspartate methyltransf erases . 

10 The L-isoaspartyl methyltransf erase (Pimt)i as an examplei is a 
highly conserved enzyme utilising S-adenosy lmethionine (AdoMet) 
to methylate aspartate residues of proteins damaged by age- 
related isomer isat ion and deamidation- 

15 The new protein can find application in diagnosis/modulation of 
protein damage and age-related degenerative processes. 



20 



unknown protein 

weak similarity to beta-aspartate methyl transferase pimT of 
Mycobacterium leprae 
perhaps complete cds- 

25 Sequenced by MediGenomix 

Locus: unknown 

Insert length: 1AH2 bp 
30 Poly A stretch at pos- Ifln-. polyadenylati on signal at pos- lfi02 

1 CCTTCGCGAA ACACTATGCT AATGGC ATGG TGCCGCGGTC CTGTCTTGCT 

51 GTGCCTGCGG CAGGGGCTCG GAACCAATTC ATTCCTGCAC GGCCTGGGGC 

35 1D1 AGGAGCCCTT CGAGGGAGCT CGGTCACTGT GTTGCAGGTC CTCGCCTAGA 

151 GACCTGCGAG ATGGAGAAAG AGAGCACGAG GCGGCACAAA GGAAAGCCCC 

201 AGGAGCAGAG TCTTGCCCAT CTCTCCCTCT GAGCATCTCG GACATTGGGA 

251 CTGGATGTCT TTCGTCACTG GAAAACCTCA GACTGCCGAC GCTGCGGGAA 

301 GAGTCATCCC CTCGAGAGCT CGAGGACTCG AGCGGAGACC AGGGCCGGTG 

40 351 CGGTCCCACA CACCAGGGAT CCGAGGATCC TTCGATGCTC TCGCAGGCCC 

MD1 AGTCCGCTAC CGAGGTCGAA GAGCGTCACG TCTCCCCTTC TTGTTCAACT 

HS1 TCCAGAGAGA GACCCTTTCA GGCTGGGGAA CTGATTTTAG CTGAGACTGG 

501 GGAGGGAGAA ACAAAATTTA AGAAATTATT TAGGTTGAAC AACTTCGGAC 

551 TCTTAAATAG TAACTGGGGG GCAGTCCCGT TCGGCAAGAT CGTGGGGAAG 

45 fe.01 TTCCCCGGCC AGATACTGAG GAGTTCCTTC GGTAAGCAGT ACATGCTGAG 

bSl GAGGCCAGCC TTGGAAGACT ATGTAGTATT GATGAAAAGA GGGACTGCCA 

701 TAACATTCCC AAAGGATATT AATATGATTC TCTCAATGAT GGATATCAAC 

751 CCAGGTGATA CTGTTTTG6A AGCTGGCTCA GGCTCTGGTG GAATGAGCTT 

flOl ATTTTTATCC AAAGCAGTTG GATCACAAGG ACGAGTCATA AGTTTTGAGG 

50 651 TACGAAAAGA CCACCATGAT CTGGCTAAGA AGAATTACAA ACACTGGCGT 

^□1 GATTCATGGA AATTAAGTCA TGTAGAAGAG TGGCCAGACA ATGTGGATTT 

=151 TATTCATAAG GACATTTCAG GAGCAACCGA AGACATAAAA TCTTTAACAT 

1DD1 TTGACGCAGT AGCTTTGGAT ATGTTAAATC CTCATGTTAC TTTGCCTGTT 

1051 TTTTACCCAC ATCTTAAGCA TGGTGGTGTA TGTGCTGTAT ATGTAGTAAA 

55 1101 CATCACACAG GTTATTGAAC TTTTAGATGG AATTCGCACC TGTGAACTTG 

1151 CTCTTTCATG TGAAAAGATA AGCGAG6TCA TTGTCAGAGA TTGGTTGGTT 

1201 TGCCTTGCAA AACAGAAAAA TGGAATTTTA GCTCAAAAAG TAGAATCTAA 

1251 AATCAACACA GATGTACAAC TAGATTCTCA AGAGAAAATT GGAGTTAAAG 
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1301 GTGAGCTGTT 
13S1 TATGGATCAT 
1M01 TACAGCTTTT 
1M51 TACTCCAGAT 
1S01 ACTTTATATT 
1551 CTATGACTTA 
IbOl ATGTATAACA 
lb51 GCAGTTAGTT 
17D1^GCCATCTCAT 
1751 TCAAGAAATT 
lflDl GAAATAAAAT 



TCAAGAGGAT 
TTCCCTATGT 
CTTGTCAAGT 
GACAGTAACT 
GAAAATCACT 
TATAACTTAT 
TAGCAAAACT 
TGACATTTTG 
TCTTCACTTC 
AGTTTTCTTT 
ATGCATTTTA 



GACCATGA AG 
TGCTA6ACCA 
TGAGGAAGGT 
GACTTGAAGA 
GCTTCCATAG 
ACATATAATT 
GCTTAAACAT 
TAGTTAATGA 
CTGTAAACCA 
CCTTTATTTG 
AGAAAAAAAA 



AATC6CATTC 
GTACACTGGC 
CAAACCACAA 
TGGAAAAATA 
ATTGGCATTT 
TTGAAAATAA 
CCCATTTTGA 
TTCCAAATTG 
CTCCATAGAT 
ATTGATGGTC 
AAAAAAAAAA 



TGATTTTCCA 
AACCTGGTCA 
CTTAACTGAG 
TCAAAATAGA 
TTAGCTATTA 
CAACTAAAAG 
CACTTGTCTT 
GTTTAGTTGG 
TTGTCTTTCT 
ATTGACTACT 
AA 



15 



No BLAST result 



BLAST Results 



20 



Medline entries 



25 



No Medline entry 



Peptide information for frame 1 



30 



35 



40 



45 



50 



55 



ORF from lb bp to mib bp=i peptide length: M77 
Category: putative protein 
Classification: no clue 



1 MLMAWCRGPV 
51 EREHEA AtJRK 
101 ELEDSSGD(2G 
151 FOAGELIL AE 
501 LRSSFGKCYM 
E51 LEAGSGSGGM 
301 SHVEEUPDNV 
351 KHGGVCAVYV 
MD1 KNGILAdJKVE 
M51 YVARPVHldflP 



LLCLRflGLGT 
APGAESCPSL 
RCGPTHtJGSE 
TGEGETKFKK 
LRRPALEDYV 
SLFLSKAVGS 
DFIHKDISGA 
VNITflVIELL 
SKINTDVC3LD 
GHTAFLVKLR 



NSFLHGLGflE 
PLSISDIGTG 
DPSMLSfiACS 
LFRLNNFGLL 
VLMKRGTAIT 
(26RVISFEVR 
TEDIKSLTFD 
DGIRTCELAL 
SfiEKIGVKGE 
KVKPflLN 



PFEGARSLCC 
CLSSLENLRL 
ATEVEERHVS 
NSNWGAVPFG 
FPKDINMILS 
KDHHDLAKKN 
AVALDMLNPH 
SCEKISEVIV 
LFCJEDDHEES 



RSSPRDLRDG 
PTLREESSPR 
PSCSTSRERP 
KIVGKFPGfll 
MMDINPGDTV 
YKHURDSUKL 
VTLPVFYPHL 
RDULVCLAKfl 
HSDFPYGSFP 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7fli21-i frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf brS_7fli51 ■> frame 1 



Report for DKFZphf br5_7 A i 21 - 1 
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CLENGTH3 MfiE 
EMU! 53521. ED 

5 Epll b-Bfl 

CHOMOLI TREMBL : AFDfififlD0_2 product: "unknown"; Rhodococcus 

erythropolis ARC (arc) gene, complete cdsi and unknown genes- Se- 
23 

EFUNCAT3 r general function prediction III. jannaschiii 

10 nJ013M3 be-10 

CFUNCAT3) DS.D7 translational control IS. cerevisiae-. YJLiaSc3 
be-DM 

IBLOCKSJ BLOOfiOlE 
CBL0CKS3 BL0127TA 
15 EKtO Alpha_Beta 

EKIO L0U_C0MPLEXITY 5 • n V. 

SEC PSRNTMLMAWCRGPVLLCLRflGLGTNSFLHGLGflEPFEGARSLCCRSSPRDLRDGEREHE 

20 SEG 

PR1> cccceeeeeecccccchhhhhccccceeeeeccccceeeceeeeccccccccccchhhhh 

SEC AAflRKAPGAESCPSLPLSISDIGTGCLSSLENLRLPTLREESSPRELEDSSGDflGRCGPT 
SEG 

25 PRD hhhhhccccccccccceeeeeecccccccceeeccccccccccccccccccccccccccc 

SE<2 H(2GSEDPSI1LS(2A(2SATEVEERHVSPSCSTSRERPFt3AGELILAETGEGETKFKKLFRLN 
SEG 

PRD cccccccchhhhhhhhhhhhhccccccccccccccccccceeeecccccccceeeeeecc 

30 

SE<3 NFGLLNSNUGAVPFGKIVGKFPGfllLRSSFGKOYMLRRPALEDYVVLMKRGTAITFPKDI 

SEG 

PRD ccccccccccccccceeeccccceeeeecccceeeeccchhhhhhhhhhccceeeecccc 

35 SEC NMILSnnDINPGDTVLEAGSGSGGMSLFLSKAVGSflGRVISFEVRKDHHDLAKKNYKHUR 

SEG - xxxxxxxxxxxx 

PRD cceeecccccccceeeeeccccchhhhhhhhhccccceeeeeehhhhhhhhhhhhhhhhh 

SEC DSUKLSHVEEUPDNVDFIHKDISGATEDIKSLTFDAVALDMLNPHVTLPVFYPHLKHGGV 
40 SEG . . 

PRD hccccccccccccceeeeecccccccccccccccceeeecccccccchhhhhhhcccccc 

SEC CAVYVVNIT(2VIELLDGIRTCELALSCEKISEVIVRDULVCLAK(3KNGILAflKVESICINT 
SEG 

45 PRD eeeeeechhhhhhhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhccceeeeccccccc 

SEC DVfiLDSCEKIGVKGELFCEDDHEESHSDFPYGSFPYVARPVHblflPGHTAFLVKLRKVKPC 
SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccccc 



50 



55 



SEC LN 
SEG - - 
PRD cc 



(No Prosite data available for DKFZphfbr2_7ai21.1) 
(No Pfam data available for DKFZphf br2_?fli21 • 1 ) 
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DKFZphmel2_12jl 



group: melanoma derived 

DKFZphmel2_12 jl encodes a novel ^05 amino acid protein-i which has 
similarity to integrin I of Saccharomyces cerevisiae. 

The novel protein contains a leucin zipper- 
No informative BLAST results^ No predictive prosite-. pfam or SCOP 
motif e- 

15 The new protein can find application in studying the expression 
profile of melanoma-specific genes* 



10 



20 



weak similarity to integrin I (Saccharomyces cerevisiae) 
Sequenced by EMBL 
Locus- unknown 



25 Insert length: 2*^2 bp 

Poly A stretch at pos- 2 c IPb-i no polyadenylation signal found 



1 CGAAAGCTAA AGGCCGGCGC 
30 51 CCGCGGCCGG TGCAATGGAC 

101 ACTGTCAGTC AGTACAAAGC 
151 TTTGCGGCAC TTGGAGGTAA 
201 CATCA AATCA GATATTAACA 
251 CTTGAAGACC CCAACATAAG 
35 301 GCTGTCTCAA CTAGCAGTAG 
351 CATATAATCT GAATAGTGTG 
M01 ACTGATTCGG TGTTTTTGCA 
M51 TAATGTCAAA ATTTTCTATT 
501 TCCTGATAGA TCACATTCAA 
40 551 CTAGGATTAT TGGCAAATCT 
tOl CATAAAGACA TTGAGTAATG 
tSl TGTTGGCCCA TAGTAGTTTA 
701 TCCAGTTTGA CATTAAATGA 
751 AAACATTCAT CAGACTTTTC 
45 fiOl ATGGCACTCT AACTAGAAAG 

651 AAGAATCCTA AAATTGCTGA 
=101 ATGTCTTCAC CAAGTATTAG 
151 CTTCAAAGGT TTTAGAATTA 
1001 CGCCATATGC TCACTCAGAT 
50 1051 CACTCTGGGA AGCCATACTA 
1101 GCTGGTTAAG CCAACCTTTG 
1151 TTGGAGTTGT TCAAGGAAAT 
1201 TTCCTCGGCT GATCGTTTTG 
1251 AACTTCAGTT CACAGAACAA 
55 1301 GTGAAAGGGA TTGCCAAGGC 
1351 TGATACACTA AAAATGCATA 
1M01 CCACTCTTAT AGAACAACA A 
1M51 GGAACAAAGG TTGCAGATTC 



ACGCTGGGCG GT6GTGGTCC CTAAGCCGGG 
TCCACTGCCT GCTTGAAGTC CTTGCTCCTG 

CGTGAAGTCA GAGGCGAACG CCACTCAGCT 

TTTCTGGACA GAAACTCACA CGACTATTTA 

AGTGAATGCT TGAGTTGCCT TGTAGAGCTA 

TGCTTCACTG ATCTTAAGTA TTATCGGTTT 

ACATTGAAAC CAGAGATTGT CTTCAGAATA 

CTGGCGGGAG TGGTTTGTCG GAGCAGCCAC 

GTGCATTCAA CTTCTACAGA AGTTAACATA 

CTGGTGCCAA TATAGATGAA TTAATTACGT 

TCTTCTGAAG ATGAGTTAAA AATGCCTTGT 

TTGTCGGCAC AATCTTTCTG TTCA AACGCA 

TGAAATCTTT TTATCGAACT CTTATCACCT 

ACTGTGGTTG TGTTTGCACT TTCAATATTA 

AGAGGTGGGG GAAAAGCTAT TCCATGCTCG 

AACTAATATT TAATATTCTC ATAAACGGTG 

TATTCAGTTG ACCTACTGAT GGATCTCCTT 

TTATCTCACC AGATATGAGC ACTTTTCTTC 

GTCTTCTTAA TGGAAAGGAT CCTGATTCCT 

CTTCTTGCCT TCTGTTCAGT GACTCAGCTG 

GATGTTTGAA CAGTCTCCAC CTGGCAGCGC 

AATGTTTAGA ACCTACTGTG GCTCTACTGC 

GACGGATCAG AAAACTGTTC TGTTTTAGCA 

ATTTGAGGAT GTCATAGATG CTGCTAACTG 

TGACCCTTCT GCTGCCTACA ATCCTTGATC 

AATCTAGATG AGGCTTTAAC AAGAAAAAAT 

CATTGA AGTT TTGTTAACTC TCTGTGGAGA 

TTGCAAAAAT CTTGACAACT GTCAAGTGTA 

TTTACATATG GCAAGATTGA CCTGGGATTT 

TGAATT ATGC AAACTTGCTG CTGATGTAAT 
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1501 TTTGAAAACT CTTGATTTGA TTAACA A ACT TAAACCATTG GTTCCTGGTA 

1551 TGGAAGTAAG CTTCT ACAAA ATACTTCAGG ACCCACGTTT GATTACTCCT 

IbOl TTGGCTTTTG CTTTA ACGTC AGATAATAGA GAACAAGTAC AGTCTGGACT 

lb51 GAGAATATTA TTGGAGGCTG CTCCACTGCC AGATTTTCCT GCTTTAGT AC 

5 1701 TTGGAGAAAG TATAGCAGCA AACAATGCCT ATAGACAACA GGAAACAGAA 

1751 CATATACCCA GAAAAATGCC CTGGCAATCA TCAAATCACA GTTTTCCAAC 

lflOl ATCAATAAAG TGTTTAACTC CTCATTTGAA AGATGGTGTT CCTGGATTGA 

1A51 ATATTGAAGA ATTAATAGAG AAACTTCAGT CTGGAATGGT GGTAAAGGAT 

nOl CAGATTTGTG ATGTGAGAAT ATCTGACATA ATGGATGTAT ATGAAATGAA 

10 nSl ACTATCCACA TTAGCTTCCA A AGAAAGCAG GCTACAAGAT CTTTTGGAA A 

2001 CAAAAGCTCT AGCCCTTGCA CAGGCTGATA GACTGATTGC TCAGCATCGC 

2051 TGTCAA AGAA CTCAAGCTGA AACAGAGGCA CGGACACTTG CTAGTATGTT 

2101 GAGAGAAGTT GAGAGAAAAA ATGAAGAGCT TAGTGTGTTG CTGAAGGCGC 

2151 AGCAAGTTGA ATCAGAAAGA GCGCAGAGTG ATATTGAGCA TCTCTTTCAA 

15 22D1 CATAATAGGA AGTTAGAGTC TGTGGCTGAA GAACATGAAA TACTGACAAA 

2251 ATCCTACATG GAACTTCTTC AGAGAAATGA AAGTACTGAA AAGAAGAATA 

2301 AAGATTTACA GATCACATGT GATTCTCTGA ATAAACAAAT TGAGACAGTG 

2351 AAAAAGTTGA ATGAGTCACT CAAGGAACAA AATGAAAAAA GTATTGCCCA 

2it01 ATTAATAGAG AAAGAAGAAC AGAGAAAAGA AGTACAGAAT CAGCTAGTAG 

20 2451 ACAGAGAACA TAAGCTAGCA AATTTGCATC AAAAAACAAA AGTACAAGAA 

2501 GAAAAGATTA AAACCTTACA A AAGGAAAGG GAAGATAAGG AAGAAACCAT 

2551 TGATATCCTT AGAAAAGAAT TAAGCAGAAC AGAACAGATA AGAAAAGAGT 

2b01 TGAGCATTAA GGCTTCCTCC CTAGAGGTTC AAAAGGCACA ATTAGAAGGT 

2bSl CGTTTGGAAG AGAAAGAGTC CTTGGTGAAA CTTCAGCAAG AGGAATTGAA 

25 2701 CAAACACTCC CACATGATAG CAATGATCCA CAGTTTAAGT GGTGGAAAAA 

2751 TAAATCCAGA A ACTGTGAAT CTCAGTATAT AGACATTATG GCATTTTGGA 

2601 ATTTGTAATC TCATGATATT TTTGATGTAT TTATCTATTG GAGGGGGGGT 

2fi51 GGGTAGGGGA GTTAATTTGT GACTTCGTAA CAATAAGAAG TTATTATCT A 

2^01 ATTTAGTAAA GACCCTGATC TGTTGCAAA A AAAAAAAAAA AA 

30 

BLAST Results 

35 No BLAST result 

Medline entries 



40 

^03^111: 

Hostetter NK-i Tao NJ-i Gale C-i Herman DJ-. llcClellan fin Sharp RL-i 
Kendrick KE-t Antigenic and functional conservation of an 
integrin 
45 I-domain in 

Saccharomyces cerevisiae- Biochem flol Med m5 Augn55(2) : 125-30 

Berton Gi Lowell CA-; Integrin signalling in neutrophils and 
50 macrophages. Cell Signal l^TT SepUKT) : b21-35 



55 



Peptide information for frame 2 



ORF from t5 bp to 277T bp; peptide length: "505 
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Category: putative protein 

Classification: Cellular transport and traffic 
Prosite motifs: LEUCINE_ZIPPER (331-352) 

5 

1 MDSTACLKSL LLTVSflYKAV KSEANATCLL RHLEVISGflK LTRLFTSNtJI 

51 LTSECLSCLV ELLEDPNISA SLILSIIGLL SflLAVDIETR DCLC2NTYNLN 

101 SVLAGVVCRS SHTDSVFLflC K3LLCKLTYN VKIFYSGANI DELITFLIDH 

151 IdJSSEDELKM PCLGLLANLC RHNLSVflTHI KTLSNVKSFY RTLITLLAHS 

10 201 SLTVVVFALS ILSSLTLNEE VGEKLFHARN IH<2TF<3LIFN ILINGDGTLT 

251 RKYSVDLLMD LLKNPKIADY LTRYEHFSSC LHflVLGLLNG KDPDSSSKVL 

301 ELLLAFCSVT flLRHULTfltlfl FE(2SPPGSAT LGSHTKCLEP TVALLRULSfl 

351 PLDGSENCSV LALELFKEIF EDVIDA ANCS SADRFVTLLL PTILDflLflFT 

M01 EflNLDEALTR KNVKGIAKAI EVLLTLCGDD TLKMHIAKIL TTVKCTTLIE 

15 M51 (2CFTYGKIDL GFGTKVADSE LCKLA ADVIL KTLDLINKLK PLVPGMEVSF 

501 YKILfiDPRLI TPLAFALTSD NREflVOSGLR ILLEAAPLPD FPALVLGESI 

551 AANNAYRQflE TEHIPRKMPU (2SSNHSFPTS IKCLTPHLKD GVPGLNIEEL 

bOl IEKLflSGtlVV KDfllCD VRIS DII1DVYEMKL STLASKESRL (2DLLETKALA 

b51 LA(2ADRLIA(J HRCflRTQAET EARTLASI1LR EVERKNEELS VLLKAfiOVES 

20 7D1 ERAtJSDIEHL FflHNRKLESV AEEHEILTKS YHELL(3RNES TEKKNKDLfll 

751 TCDSLNKC2IE TVKKLNESLK Ee3NEKSIA(2L IEKEEURKEV <3N£!LVDREHK 

flOl LANLHOKTKV (3EEKIKTL(3K EREDKEETID ILRKELSRTE (JIRKELSIKA 

SSI SSLEV(JKAi2L EGRLEEKESL VKLQflEELNK HSHMIAMIHS LSGGKINPET 
101 VNLSI 

25 

BLASTP hits 

30 No BLASTP hits available 

Alert BLASTP hits for DKFZphmel2_12 jl -. frame 2 

TREHBL : SCINTANA_1 Saccharomyces cerevisiae integrin analogue 
35 gene-i 

complete cds-i N = 1-. Score = 21t-> P = l-3e-13 

>TREMBL : SCINTANA_1 Saccharomyces cerevisiae integrin analogue 
40 gene-i complete 
cds- 

Length = 1-.015 



45 



55 



HSPs: 

Score = 21b (32-M bits)i Expect = 1.3e-13-. P = l-3e-13 
Identities = 60/302 <2b5c)-. Positives = 155/3D2 (512) 



(Suery: 5^7 IEELIEKLflSGriVVKDfllCDVRISDIII 

50 DVYEMKLSTLASKESRLODLLETKALALAfl b53 

I L EKL++ D+ + +IS++ + E +L+ + ++ L+ 

LET AL + 

So jet: 275 ISLLKEKLETATTANDENVN- 
KISELTKTREELEAELAAYKNLKNELETKLETSEKALKE 333 



(Suery: bSM A DRLIA(2HRC(3RT(3AETEAR TLASMLREVERKNEELSVLLKA- 

<3fiVESERA<J 70M 
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+ +TE+ +L+L +E+++E+L+ LK 

+ Q + ++ Q 
Sbjct: 33M 

VKENEEHLKEEKIflLEKEATETKfi<3LNSLRANLESLEKEHEDLAA(2LKKYEE(2IANKER(2 3^3 

5 

fluery: 7DS SDIEHLFflHNRKLESVAEEHEILTKSYMEL LflRNESTEKKNKDLfllT- 

CDSLNKfllE 7bD 

+ E + a N ++ S +E+E + K EL ++ +ST ++ +L+ + 

D+LN (31+ 
10 Sbjct: 3«m YN- 

EEISflLNDEITSTfldENESIKKKNDELEGEVKAPlKSTSEEOSNLKKSEIDALNLfllK USE 

fluery: 7bl 

TVKKLNESLKE(2NEKSIA(3LIEKEEc2RKEV£2Nc3LV]>REHKLANLH£2KTKV(2EEKIKT fil7 

15 + KK NE+ + + SI + + + KE+(3 + + +E +++ L K K 

E+K 

Sbjct: M53 

ELKKKNETNEASLLESIKSIESETVKIKELflDECNFKEKEVSELEDKLKASEDKNSKYLE 515 

20 fluery: fllfl LflKEREDKEETIDI LRKELSRTEOIRKELSIKASSLE- 

VOKAflLEGRLEEKESLVK fl?2 

LQKE E +E +D L+ +L + + K S L ++K E R 

+ E L K 

Sbjct: 513 

25 LflKESEKIKEELDAKTTELKIOLEKVTNLSKAKEKSESELSRLKKTSSEERKNAEEtfLEK 572 

Query- 873 LQQE fl7b 
L+ E 

Sbjct: 573 LKNE 57b 



30 



Score = 16b (27. 1 bits)i Expect = 2-Oe-lO-i P = 2-De-lO 
Identities = fl2/301 (27X)i Positives = 155/3D1 (SIX > 



fluery: 51fl EELIEKLflSGMVVKDfllCDVRISDIflDVYEIIKLSTLASKESR L<2D- 

35 LLETKALALAfl b53 

+ELI + L<2 + +K + D S + + V L K++ LflD +L 

K 

Sbjct: bH DELI- 

RL<3NENELKAKEIDNTRSELEKVSLSNI>ELLEEK<2NTIKSL<3DEILSYKDKITRN bb^ 

40 

Query: bSM 

A»RLIAi3HRC(3RTOAETEARTLASI1LREVERKNEELSVLLKA(2(3VESERAi3SDIEHLF(3H 713 
++L++ R + E+ L LR + ++ LK + ES + 

++++E + 

45 Sbjct: b?D DEKLLSIERDSKRDLES 

LKEt3LRAA(2ESKAK\/EEGLKKLEEESSKEKAELEKSKEI1 725 

Query: 71M NRKLESVAEEHEILTKSYI1ELLflRN-ESTEKKNKI>LiaiTCDSL- 
NKfllETVKKLNESLKE 771 
50 +KLES E +E KS (IE ++++ E E+ K + +L +++ + + 

++NES K+ 
Sbjct: 72b 

nKKLESTIESNETELKSSMETIRKSDEKLEflSKKSAEEDIKNLdHEKSDLISRINESEKD 7S5 

55 fluery: 772 ONE-KSIAflLIEKEEi2RKE-V«2N<2LVl>REHKL- 
ANLHQKTKVflEEKIKTLOKEREDKEET S2fl 

E KS ++ K E V+ +L + + K+ N + T V + K++ 

+++E + DK+ 
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Sbjct: 7flb IEELKSKLRIE AKSSSELETVK(2ELNNA<2EKIRVNAEENT- 
VLKSKLEDIERELKDKfiAE fii4M 

Query: BE1 II>ILR — KEL — SRTEfllRKEL SIKASSLEVQKAflLE- 

5 GRLEEKESLVKLfi fl?M 

I + KEL SR +++ +EL S + S EV + K <2+E 

+L+EK L++ + 
Sbjct: AMS 

IKSN<3EEKELLTSRLKELE(2ELDST(3(2KA<3KSEEESRAEVRKF<2VEKS(2LI>EICAI'1LLETK 

10 

(Suery: A7S (2EEL-NK flflO 
+L UK 

Sbjct: 105 YNDLVNK 111 

15 Score = 173 (Bb-D bits)i Expect = S-Te-OI-. P = S-Te-DT 
Identities = 77/Bfl? (Bb*)-, Positives = lMb/567 (50*) 

fluery: bOl IEKL(2SGMVVK]><2ICDVRISI>im>VYEf1KLSTLASKES-- 
RLflDLLETKALALAflADRLI bSfl 
20 ++K + + K++ + IS + D E+ ST ES + D LE + 

A+ 

Sbjct: 3fiD LKKYEEdJI ANKERCYNEEISflLND — EIT- 
STflflENESIKKKNDELEGEVKAMKST N3B 

25 fluery: bST A«2HRC<2RT<3AETEARTLASMLREVERKNE~ 
ELSVLLKA(3i2VESERAi2SI>IEHLFi3H-NR 71S 

++ + ++E +A L ++E+++KNE E S+L + +ESE + 1+ 

L N 

Sbjct: H33 SEEcJSNLKKSEIDALNL — (2IKELKKKNETNEASLLESIKSIESETVK — 
30 IKELflDECNF i4fl6 

fiuery: 71b KLESVAEEHEILTKSY 

nELLflRNESTEKKNKDLtJITCDSLNKfllETVKKLNESLKECJ 775 

K + V+E + L S + L+ + +EK ++L L (3+E V 

35 L+++ KE+ 

Sbjct: MAI 

KEKEVSELEDKLKASEDKNSKYLELi2KESEKIKEELI>AKTTELKI<2LEKVTNLSKA-KEK 5^7 

<2uery: 773 NEKSIAfiLIE-KEEc2RKEV(2Nc2L — VDREHKLAN — 
40 LHflKTKVdEEKIKTLdJKEREDKEE AB7 

+E +++L + E+RK + QL + E ++ N ++ K+ E T+ 

+E + K 

Sbjct: SMfl 

SESELSRLKKTSSEERKNAEE<2LEKLKNEI<3IKN(2AFEKERKLLNEGSSTITj2EYSEKIN bD7 

45 

fluery: ABA TI- 

DILRKELSRTEt2IRKELSIKASSLEVc5KAflLEGRLEEKESLVKL(2(2EELNKHSHNI AAS 

T+ J> L + + E KE+ S LE + LEEK++ +K 0+E+ 

+ I 
50 Sbjct: bOfl 

TLEDELIRLtJNENELKAKEIDNTRSELEKVSLSNDELLEEKfiNTIKSLflDEILSYKDKI bbb 

Score = 171 (55-7 bits)-, Expect = 1.3e-01i P = T-Be-OT 
Identities = 7b/311 (B^)-. Positives = 1SB/311 (MS*) 

55 

fluery: Sib NIEELIEKLflSGH VVKDfl 

ICDVRISDIMDVYEIIKLSTLASKESRLdJDLLETKA bMA 
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N EE +EKL + + + + K+fl + + SI Y K ++TL + 

RLfl + E KA 
Sbjct: 5b5 

NAEEflLEKLKNEIfllKNflAFEKERKLLNEGSSTITflEYSEKINTLEDELIRLflNENELKA b24 

5 

fluery: b>n LALAflADRLIAflHRCfiRTflA-ETEARTLAStlLREVERKNEELSVL- 
LKAQflVESERAflSD 70b 

+ + + + E + T+ S+ E+ ++++ K 

+E + ++ D 
10 Sbjct: b25 

KEIDNTRSELEKVSLSNDELLEEKflNTIKSLflDEILSYKDKITRNDEKLLSIERD-SKRD bfl3 

fluery: 707 IEHLFflHNRKL-ESVAEEHEILTKSYMELLflRNESTEKKN ■ 

KDLfllTCDS LNKQ 756 

15 +E L + R ES A+ E L K E + EK K . L+ T +S 

L 

Sbjct: k&H 

LESLKEflLRAAGESKAKVEEGLKKLEEESSKEKAELEKSKEMMKKLESTIESNETELKSS 7M3 

20 fluery: 751 IETVKKLNESLKEONEKSIAflLIEK- 

EEflRKEVflNflLVDREHKLANLHflKTKVflEE K filM 

+ET ++K +E L E<2++KS +1+ + ++ ++ +++ + Z + L K 

+++ + + 

Sbjct: 7MM METIRKSDEKL- 
25 EflSKKSAEEDIKNLflHEKSDLISRINESEKDIEELKSKLRIEAKSSSE 602 

fluery: filS IKTLflKEREDKEETIDILRKE LSRTEfllRKELSIKASSL 

EVflKAflLEGRLEEK 6b? 

++T+++E + +E I + +E S+ EI +EL K + + + +K 

30 L RL+E 

Sbjct: 603 

LETVKflELNNAflEKIRVNAEENTVLKSKLEDIERELKDKflAEIKSNflEEKELLTSRLKEL fib5 

fluery: Aba ESLVKLflflEELNK 660 
35 E + fl++ K 

Sbjct: Sb3 EflELDSTflflKAflK 675 



40 



Score = IbS (SM-fl bits)i Expect = M-le-06-i P = M-le-Ofl 
Identities = b5/26b C 252) ■* Positives = l^/Efit (52/C) 



fluery: SIS LNIEELIEKLQSGMVVKDfllCDVR-ISDIIIDVYEriKLSTLASKESRL- 
(2DLLETKALALA bSS 

+N ++ + L+ + K I +++ I++ ++ +++ + L+ ++ + 

++L+E K+ 

45 Sbjct: 11M VNHflKETKSLKEDIAAK-- 

ITEIKAINENLEKflKIflCNNLSKEKEHISKELVEYKS-RFfl 170 

fluery: b53 flADRLIAflHRCflRTdAETEARTLASMLREVERKNEELSVLLKAAAVESE 

-RAflSDIE 70fl 

50 D L+A+ T+ + ++LA+ ++++ +NE L ++ + ES 

fl+ 1+ 

Sbjct: 171 SHDNLVAK LTE 

KLKSLANNYKDHflAENESLIKAVEESKNESSIOLSNLQNKID 5S3 

55 fluery: 7DT HLFflH— NRKLE — 

SVAEEHEILTKSYflELLflRNESTEKKNKDLOITCDSLNKfllETVKK 7bM 

+ Q N ++E S+ + E L K+ +L fl E K+ + ]> 

(21 +K+ 
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Sbjct: 22M SnSfiEKENFfllERGSIEKNIEflLKKTISDLEGTKEEIISKSDSSK--- 
DEYESQISLLKE 26D 

fluery: 7b5 

LNESLKE(2NEKSIA(3LIEKEEflRKEV(3NflLVI>REHKLANLH(3KTKV(3EEKIKTLt2KEREI> 62M 
E+ N++++ ++ E + R+E++ +L ++ L K + E+ +K 

+++ E 

Sbjct: 281 

KLETATTANDENVNKISELTKTREELEAELAAYKNLKNELETKLETSEKALKEVKENEEH 3M0 



fluery: 555 - 

KEETIDILRKELSRTE<2IRKELSIKASSLEVQKA<2LEGRLEEKESLVKL<2(3EELNK 660 

KEE I L KE + T+fl L SLE + L +L + + E + ++ 

+ N+ 

15 Sbjct: 3M1 LKEEKIfl- 

LEKEATETK(2«2LNSLRANLESLEKEHEDLAA(2LKKYEE<2IANKER<3YNE 31b 



Score = 156 (23-7 bits)-, Expect = Lle-Ol-, P = l-le-07 
Identities = 7M/2b6 (2750-. Positives = 13b/2b6 (.50'/.) 



CJuery: 516 EELIEKLtfSGPIVVKDfllCDVRISDiriDVYEri — KL- 
STLASKESRL«2DLLET~KALALA bS2 

+E -K++ 6+ ++ +++ Ell KL ST+ S E+ L+ +ET 

K+ 

25 Sbjct: b«15 

flESKAKVEEGLKKLEEESSKEKAELEICSKEnnKKLESTIESNETELKSSIIETIRKSDEKL 7S«4 

fluery: bS3 

(3ADRLIA(2HRCl3RTtlAETEARTLASMLREVERKNEELSVLLKAQ(2VESERA(2SDIEHLFi2 712 
30 ++A++UE LS+EE+ EEL L+ + S 

++ + L 

Sbjct: 755 Ei2SKKSAEEDIKNL(2HEKS — 
DLISRINESEKDIEELKSKLRIEAKSSSELETVKtfELNN 812 

35 fluery: 713 HNRKLESVAEEHEILTKSYHELLflRNESTEKKNKDL<2ITCDSLNK<2IET — 
VKKLNESLK 770 

K+ AEE+ +L KS +E ++R E K+K +1 + K++ T 

+K+L + L 

Sbjct: 613 AflEKIRVNAEENTVL-KSKLEDIER 

40 ELKDKdAEIKSNflEEKELLTSRLKELEdELD flb? 

fluery: 771 E(2NEKSIA<2LIEKEE<2RKEV<2N<3LVDR 

EHKLANLHflKTKVflEEKIKTLfiKEREDKEE 627 

+K Afl E EE R EV+ V++ + K L K K + 

45 +++ + ++ 

Sbjct: 6bfi STQflK — AC3KSE- 

EESRAEVRKFflVEKSflLDEKAMLLETKYNDLVNKEfJAWKRDEDTVKK R2M 

Query. 626 TIDILRKELSRTEfllRKEL-SIKASSLEVlSKAfiLEGRLE 6b5 
50 T D R+E+ E++ KEL ++KA + ++++A ERE 

Sbjct: 125 TTDSURflEI EKLAKELDNLKAENSKLKEAN-EDRSE 151 



Score = 155 (23-3 bits)-. Expect = 3.1e-07-, P - 3-=ie-07 
Identities = 73/2bT C275c)i Positives = 133/2b1 

duery: b21 DVYEMKLSTLASKESRLflD-LLETKAL ALA<3ADRLIA(2HRC(2RT<JAET 

EARTLASML 1*71 



-240- 



WO 01/98454 PCT/IB01/02050 

+ + E K +T+ S L(2J> +L K ++L++ R + E + * 

R 

Sbjct: bH3 ELLEEKflNTIKS 

L(2DEILSYKDKITRNDEKLLSIERDSKRDLESLKEfiLRAA(2ESK blfl 

5 

(3uery: bfiD REVE 

RKNEELSVLLKA<J(2VESERA(3SI>IEHLF<2HNRKLESVAEEHEILTKSYI1ELL<2 ?3b 

+VE +K EE S KA + +S+ +E + N + E * 

KS +L Q 

10 Sbjct: bT? AKVEEGLKKLEEESSKEKAELEKSKEtlllKKLESTIESNET" 
ELKSSHETIRKSDEKLE<2 75b 

fluery: 737 RNESTEKKNKDL(3ITCDSLNK(2IETVKKLNESLKE<2 

NEKSIAQLIEKEEflRKEVflNfl 713 
15 +S E+ K+L(2 L +1 +K E LK + KS ++L 

+++ (J + 

Sbjct: 757 

SKKSAEEDIKNLflHEKSDLISRINESEKDIEELKSKLRIEAKSSSELETVKfiELNNAflEK fllb 

20 fluery: 71M L-VDREH 

KLANLHflKTKVflEEKIKTLCKEREDKEETIDILRKELSRTECIRKEL flMb 

+ V+ E KL ++ ++ K ++ +IK+ fl+E+E + L +EL 

T+(J + + 
Sbjct: 517 

25 IRVNAEENTVLKSKLEDIERELKDKUAEIKSNiaEEKELLTSRLKELEflELDSTflO-KAfllC 675 

fluery: 617 SIKASSLEV«2KA<3LE-GRLEEKESLVKLflflEEL-NK fiflO 

S + S EV+K <3+E +L+EK L++ + +L NK 
Sbjct: B?b SEEESRAEVRKFtfVEKSflLDEKAMLLETKYNDLVNK 111 



30 



Score = 1Mb (Sl-T bits), Expect = 3-Se-Ob, P = 3.5e-0b 
Identities = 73/311 (535:)-. Positives = 152/311 



(Juery: 5ED DNREfiVtfSGLRIL LEAAPLPDFPALV — 

35 LGESIAANNAYRflflETEHIPRK-IIPIiJfl 571 

+++ +V+ GL+ L E A L ++ L +1 +N + E I 

+ + 
Sbjct: bib 

ESKAKVEEGLKKLEEESSKEKAELEKSKEPIUKKLESTIESNETELKSSriETIRKSDEKLE 755 

40 

Query: 572 SSNHSFPTSIKCLTPHLKDGVPGLNIEEL- 
IEKLfiSGMVVKIXJICDVRISDIIIDVYEIIKL b30 

S S IK L D + +N E IE+L+S + + + + S 

++ + +L 

45 Sbjct: 75b tiSKKSAEEDIKNLdHEKSDLISRINESEKDIEELKSKLRI — : 

EAKSSSELETVKfiEL AID 

Query: b31 STLASK 

ESRLfiDLLETKALALAflADRLIA£?HRC<2RTl2AETEARTLASMLREVERKNE bo7 
50 + K + +L++K L +R+ ++ +E LS 

L+E+E++ + 

Sbjct: All NNAC2EKIRVNAEENTVLKSK 

LEBIERELKDKflAEIKSNflEEKELLTSRLKELEflELD flb? 

55 Query: bflfl 

ELSVLLKAfl(2VESERA(3SDIEHLF(3HNRKLESVAEEHEILTKSYnELL(3RNESTEKKNKl> 7M7 
S KAfl+ E E +++++ F(3 + + E+ +L Y +L+ + 

++ ++ 
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Sbjct: 8b8 — STQQKAQKSEEE-SRAEVRK-FQVEKS-- 
QLDEKAHLLETKYNDLVNKEQAUKRDEDT 121 

Query: 7MB LQITCDSLNKQIETVKKLNESLKEQNEKSIAQLIEKEEQRKEVQNQLV 

DREHKLANL 8D4 

++ T DS ++IE + K ++LK +N K L E E R E+ + ++ D 

+ K N 

Sbjct: 122 VKKTTDSQRQEIEKLAKELDNLKAENSK 

LKEANEDRSEIDDLMLLVTDLDEK — NA 175 

Query: 8DS HQKTKVQEEKIKTLQKEREDKEETID 830 

++K+++ ++ E +D+EE D 
Sbjct: 17b KYRSKLKDLGVEISSDEEDDEEEEDD 1001 

15 Score = 14b (21-1 bits)-. Expect = 4-be-Db-. P = 4-be-Ob 
Identities = 82/313 (2b*)-. Positives = 145/313 <4b*) 

Query: 518 

EELIEKLQSGMVVKDQICDVRISDIflDVYErilCLSTLASKESRLQDLLETKALALAQADRL bS7 
20 EEL +L + +K+++ + + E+K + KE ++U LE +A 

Q 

Sbjct: 304 EELEAELAAYKNLKNELETKLETSEKALKEVKENEEHLKEEKIQ-- 
LEKEATETKQQ 358 

25 Query: b58 IAQHRCQRTQAETEARTLASMLREVERK NEELSVL 

LKAQQVESERAQSD 70b 

+ R EE LA+ L++ E + NEE+S L + + Q 

E+E + 

Sbjct: 351 

30 LNSLRANLESLEKEHEDLAAQLKKYEEQIANKERQYNEEISQLNDEITSTQQENESIKKK ma 

fluery: 707 IEHLFQHNRKLESVAEEHEILTKSYMELLQRN- 
ESTEKKNKDLQITCDSLNKQIET-VKK 7b4 

+ L + ++S +EE L KS ++ L + +KKN+ + + K 

35 IE+ K 

Sbjct: Nil 

NDELEGEVKAMKSTSEEQSNLKKSEIDALNLQIKELKKKNETNEASLLESIKSIESETVK 178 

Query: 7b5 LNESLKEQN — EKSIACJLIEK EEQRKEVQNQLVDREHKLAN-LHQKT-- 

40 -KVQEEKI 815 

+ E EN EK +++L +K E + +L K+ L KT 

K+Q EK+ 
Sbjct: 471 

IKELQDECNFKEKEVSELEDKLKASEDKNSKYLELQKESEKIKEELDAKTTELKIQLEKV 538 

45 

Query: 81b 

KTLQKEREDKEETIDILRKELSRTEQIRKELSIKASSLEVQKAQLEGRLEEKESLVKLQQ 875 
L K +E E ELSR ++K S + + E Q +L+ ++ K 

+ ++ 

50 Sbjct: 531 TNLSKAKEKSES— ELSR 

LKKTSSEERKNAEEQLEKLKNEIQIKNQAFEKER 588 

Query: 87b EELNKHSHnTAMIHSLSGGKINPETVNL 103 
+ LN+ SI +S + E + L 

55 Sbjct: 581 KLLNEGSSTITQEYSEKINTLEDELIRL bib 

Score = 145 (21-8 bits)-. Expect = 5.1e-0b-, P = S.1e-0b 
Identities = 51/24b (23*)-. Positives = ll5/24b (4b*) 
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fluery: b3M ASKESRL<2- 

DLLETKALALAflADRLIAflHRCORTfl AETEARTLASMLREVERKNEELSVL b12 

+ ES +0 L+ K ++ + (2 + +R E L + ++E+ 

5 EE ++ 

Sbjct: 207 SKNESSI<2LSNL(2NKIDSMSc2EKE 

NF<2IERGSIEKNIE<2LKKTIS1>LE(2TICEE — II 2bl 

CJuery: b13 LKACJOVESERAflSDIEHLFdHNRKLESVAEEHEI 

LTKSYMELLflRNESTEKKNKD 747 

K+ +E+SIL+ ++A++ LTK+ EL 

+ + + 

Sbjct: 2b2 SKSDSSKDEY-ES(2IS- 

LLKEKLETATTANDENVNKISELTKTREELEAELAAYKNLKNE 311 
fluery: 7Mfl 

L<2ITCDSLNK(3IETVKKLNESLKE(3NEICSIAflLIEKEEflRKEVaN<2LVDREHKLANLH(2K SD7 
L+ ++ K ++ VK+ E LKE+ + + E ++<2 ++ L E 

+ +L + 
Sbjct: 350 

LETKLETSEKALKEVKENEEHLKEEKIflLEKEATETKflflLNSLRANLESLEKEHEDLAAQ 371 
duery: flDfl 

TKVflEEKIKTLflKEREDKEETIDILRKELSRTEfllRKELSIKASSLEVflKAflLEGRLEEK fib? 
25 K EE+I KER+ EE I L E++ T+<2 + + K LE + 

++ EE+ 

Sbjct: 3flD LKKYEEfllAN— KERflYNEE- 
IS(JLNDEITST(2(2ENESIKKKN]>ELEGEVKAnKSTSEEa 43b 

30 Query: Aba ESLVKL<2<2EELN 871 

+L K + + LN 
Sbjct: 437 SNLKKSEIDALN MMfl 

Score = 137 (20-b bits)i Expect = M.2e-D5-. P = 4.2e-DS 
35 Identities = 81/312 (B5JC) i Positives = 140/312 (445c) 

Query: 518 EELIEKLflSGO VVKDfllCDVRISDItlDVYEIIKLSTLASK-ESRLODLLET- 
KALALAOAS bSS 

+EL ++++ ++ +++ S+I D +++ L K E+ LLE+ 

40 K++ 

Sbjct: 420 DELEGEVKAMKSTSEE<2SNLKKSEI- 
DALNLtJIKELKKKNETNEASLLESIKSIESETVK 478 

Query: bSb 

RLIA<2HRC(3RT(2AETEARTLASnLREVERKNEELSVLLKA(2<2VESERA<2S])IEHLF<2HNR 71S 

<2 C EE L L+EKN+ L K + E + 

L 

Sbjct: 471 IKELflDECNFK — 

EKEVSELEDKLKASEDKNSKYLEL(2KESEKIICEELDAKTTELKI<2LE S3b 
Query: 71b 

KLESVAEEHEILTKSYMELLt3RNESTEKfCNKDL(2ITC])SLNIC£2IETVKKLNESLKE(3NEK 77S 
K + ++++ e ++S + L++ S E+KN + (2+ (21+ + + 

K NE 

Sbjct: S37 KVTNLSKAKE-KSESELSRLKKTSSEERKNAEE(2LEKLKNEli2IKN- 
<2AFEKER<LLNEG 514 
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fluery: 7?b SIA<2LIEKEE<3RKEVc2Nl2LV-- DREHKL-ANLH(3KTKV<2EEKIKTL£?KER- 
EDKEETIDI 631 

S E E+ ++++L+ E++L A T+ + EK+ E 

E+K+ TI 
5 Sbjct: 515 

SSTITQEYSEKINTLEDELIRLflNENELKAKEIDNTRSELEKVSLSNDELLEEKQNTIKS bSM 

fluery: 835 LRKE-LSRTE<3I RKELSIKASS LEVflKAtJLEGRLEEK 

ESLVKL<2(2E 67b 

10 L+ E LS ++I K LSI+ S LE K (2L E K EL 

KL++E 

Sbjct: b55 

LflDEILSYKDKITRNDEKLLSIERDSKRDLESLKEflLRAAfiESKAKVEEGLKKLEEESSK 71H 

15 Query: 677 ELNKHSHMIAMIHS 610 

EL K 11+ + S 
Sbjct: 715 EKAELEKSKEflMKKLES 731 

Score = 156 (11-5 bits)-. Expect = 3.'«ie-Cm-i P = 3-le-OM 
20 Identities = 60/3Sb (55"/.), Positives = lM6/35b (11!!) 

fluery: 5Mb LGESIAANNAYRflflETEHIPRKPIPUflSSNHSFPTSIKCLTPHL 

KDGVPGLN-I 517 

L E + ++ E+ + ++S+ H SIK L L K 

25 G+N + 

Sbjct: 55 

LDEHTflLRDVLETKDKENflTALLEYKSTIHKdEDSIKTLEKELETILSflKKKAEDGINKPJ AM 

Query: 5=16 EELIEKLQSGtlVVKDQICD — 
30 VRISDIM]>VYEMKLSTLASKESRL<3DLLETKALALA<3AD b55 

+ +L 11 ++C + D+V KT + KE + E 

KA+ + 

Sbjct: 65 GKDLFALSREMflAVEENCKNL£2KEKDKSNVNH<2K- 

ETKSLKEDIAAKITEIKAIN-ENLE IMS 

35 

Query: b5b 

RLIAQHRCQRTQAETEARTLASMLREVERKNEELSVLLKAQQVESERAQSDIEHLFQHNR 715 
++ Q C EE ++ L E + + + L+ + + ++ 

+ + N 

40 Sbjct: 1M3 KIIKIQ — CNNLSKEKEH — 

ISKELVEYKSRFQSHDNLVAKLTEKLKSLANNYKDI1QAENE llfl 

Query: 71b KLESVAEEHEILTKSYMELLQRN- 
ESTEKKNKDLQITCDSLNKQIETVKKLNESLKEQNE 77M 
45 L EE + + + LQ +S ++ ++ (31 S+ K IE +KK 

L++ E 

Sbjct: 111 

SLIKAVEESKNESSIQLSNLQNKIDSflSQEKENFQIERGSIEKNIEQLKKTISDLEQTKE 556 
50 Query: 775 

KSIAQLIEKEEQRKEVQNQLVDREHKLANLHQKTKVQEEKIKTLQKEREDKEETI AS1 

+ I++ + + E ++Q+ + KL KI L K RE+ E 

+ 

Sbjct: 551 EIISK 

55 SDSSKDEYESfllSLLKEKLETATTANDENVNKISELTKTREELEAELAAYKN 315 

Query: 63D --DILRKELSRTEQIRKELSIKASSLEVQKAQLEGRLEE-KESLVKLQQ— 
EELNK-HSH 663 
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+ L +L +E+ KE+ L+ +K ALE E K+ L L+ E 

L K H 

Sbjct: 31b 

LKNELETKLETSEKALKEVKENEEHLKEEKIflLEKEATETKflflLNSLRANLESLEKEHED 37S 

5 

(Juery: fifiM MIAMI fiflfl 
+ A + 

Sbjct: 37b LAAflL 360 

10 Score = 117 (17-b bits)-. Expect. = 3-fie-03-i P = 3.fie-03 
Identities = SD/B4D (BDX ) -. Positives = 111/BMO (Mb*) 

Query- b3M 

ASKESRLflDLLETKALALAflADRLIAflHRCflRTflAETEARTLASMLREVERKNEELSVLL b"=13 
15 A E L+ L E + A+ ++ + + E+ L S + + + 

+ E+L 

Sbjct: til 

AKVEEGLKKLEEESSKEKAELEKSKEMMKKLESTIESNETELKSSMETIR<Sl>EICLEflSK 7Sfl 

20 fluery: b^H KAflflVESERAfl SD- 

IEHLFflHNRKLESVAEEHEILTKSYMELLflRNESTEKKNKDLfl 711 

K+ + + + £} s» I + + + +E + + I KS EL + 

+ + + 

Sbjct: 75T 

25 KSAEEDIKNLflHEKSDLISRINESEKDIEELKSKLRIEAKSSSELETVKflELNNAflEKIR fllfi 
fluery: 7SD 

ITCDSLNKfllETVKKLNESLKEflNEKSIAQLIEKEEflRKEVflNflLVDREHKLANLHflKTK SOT 
+ + N +++ KL + +E +K A++ +E+++ + ++L + E +L 

30 + <3K + 

Sbjct: fill VNAEE-NTVLKS--KLEDIERELKDKC3- 
AEIKSNflEEKELLTSRLKELEflELDSTflflKAfl 674 

Ouery: AID VflEEK 

35 IKTLflKEREDKEETIDILRKELSRTEfllRKELSIKASSLEVflKAflLEGRLE fibS 

EE+ ++ 0 E+ +E +L E + + KE + K V+K 

+ + + 

Sbjct: A75 KSEEESRAEVRKFflVEKSflLDEKAMLL-- 
ETKYNDLVNICEflAtiJKRDEDTVKKTT-DSflRfl =131 

40 

<3uery: flbb EKESLVK fl?B 

E E L K 
Sbjct: 13B EIEKLAK 13fl 

45 Score = 1M Clb-4 bits)-. Expect = B-be-DS-. P = B.Se-OB 
Identities = b4/Efi4 (BE*)-, Positives = 135/BflM (47*) 

fluery: STfl 

EELIEKLflSGMVVKDfllCDVRISDIMDVYEMKLSTLASKESRLflDLLETKALALA (2 A b54 

50 +E+++KL+S ++ +1 E +SE +++L K+ 

++ ++ 

Sbjct: 7B3 

KEMMKKLESTIESNETELKSSMETIRKSDEKLEflSKKSAEEDIKNLflHEKSDLISRINES 7fiE 

55 fluery: bSS DRLIAflHRCfl- 

RTflAETEARTLASMLREVERKNEELSVLLKAflflVESERAflSDIEH-LFfl 71B 

++ I + + + R +A++ + L + + +E+ E++ V + V + + 

DIE L 
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Sbjct: 7B3 EKDIEELKSKLRIEAKSSSE-LETVK(3ELNNA<2EKIRvNAEENTVLKSKLE- 
DIERELKD AMD 

fiuery: 713 HNRKLESVAEEHEILTKSYnELLdRNESTEKK-NKDLCITCDSLNK- 
5 (2IETVKKLNES — ?bfl 

+++S EE E+LT EL <3 +ST++K K + + + K fl+E 

+L+E 

Sbjct : am K<3AEIKSNflEEKELLTSRLKELEflELI>ST<2flKA<2KSEEESRAEVRKF(2VEK- 
SflLDEKAN 6=1=5 

10 

fluery: 7b^ LKEflNEKSIA ULIEKEEfl— 

RKEVdNflLVDREHKLANLHflKTKVflEEKIKTLflKERE 623 

L E + a +++E +K +<2 + E KLA K + 

K+K ++R 

15 Sbjct: 100 LLETKYNDLVNKEflAlilKRDEDTVKKTTDSflRflEIE- 
KLAKELDNLKAENSKLKEANEDRS ^58 

Query: ASM DKEETI DILRKELSRTE(3IRKELSIKASSLEV(3KA(3LE6RLEEKE 

flbfl 

20 ++++»+ K ++ K+L ++ SS E + E E+ E 

Sbjct-' 1S1 EIDDLMLLVTDLDEKNAKYRSKL-KDLGVEISSDEEDDEEEEDDEEDDE 
10 Ob 

Score = lb (14. M bits)-! Expect = l.le+DO-, P = b-be-Dl 
25 Identities = 40/210 <l*i'/.) Positives = 101/E10 <4fl>:) 

fluery: bfll EVERKN — 

EELSVLLKA<2(JVESERA(3SDIEHLF<3HNRKLESVAEEHEILTKSYi1ELL<2RN 73fl 

EEKN+L+++V +++ L++ + ++LK 

30 +L + 

Sbjct: is 

ETELKNVRPSLl>EnT«3LR]>VLETKDKENi3TALLEYKSTIHK(3EDSIKTLEKELETILSi3K 74 

fluery: 73 c l ESTE 

35 KKNKDLflITCl>SLNK(2IETVKKLNESLKE(2NEKSIA<2LIEKEE(2RKEV<JN<iL 7*14 

+ E K KDL +L+++++ v++ ++|_ +++ + +++ 

K ++ + 

Sbjct: ?s KKAEDGINKHGKDLF ALSREMflAVEENCKNLflKEKDKSN— 

VNHOKETKSLKEDI 127 

40 

fluery: 715 VDREHKL ANLHflKTKViJEEKIICTLaKERED- 
KEETIDILRKELSRTElJIRKELSIKASSL 553 

+ ++ +++ + + + L KE+E +E ++ + S + K 

L+ K SL 

45 Sbjct: 12fl AAKITEIKAINENLEKtlKIflCNNLSKEKEHISKELVEYKSRFflSHDNLVAK- 
LTEKLKSL Iflfc 

Cuery: 654 EVflKAflLEGRLEEKESLVKLtJAEELNKHSHHIAfllHS BIO 

++ E ESL+K +E N+ S ++ + + 
50 Sbjct: 1S7 ANNYKDMdA ENESLIKAVEESKNESSI(3LSNL(2N 2B0 

Score = 52 (7-fi bits)-, Expect = B-Oe-lO, P = B.Qe-10 
Identities = 3T/lb7 (23JO-« Positives = 74/lb? (44X) 

55 fluery: =H LNSVLAGVVCRSSHTDSVFLflCIflLLflKLTYNVKIFYSGANIDEL- 
ITFLIDHIflSSEDE 157 

LN + + ++ ++ L+ 1+ ++ T +K N E ++ L D 

+++SEP+ 
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Sbjct: m*47 

LNLfllfCELKKKNETNEASLLESIKSIESETVKIKELflPECNFKEKEVSELEDKLKASEDK 50b 
Guery: ISA - 

5 LKnPCLGLLANLCRHNLSVtSJTHIKTLSNVKSFYRTLITLLAHSSLTVVVP ALSILSSLT 21b 

K L + +L+T T ++ T++ S + + 

S 

Sbjct: 5D7 NSKYLELflKESEKIKEELDAKT — 
TELKIdLEKVTNLSKAKEKSESELSRLKKTSSEER 5b3 

10 

Query: 217 LN-EEVGEKLFHARNI-H(3TF(3LIFNILINGDGTLTRKYS— VDLLMDLL 
Bb2 

N EE EKL + I +(2 F+ +L G T+T++YS ++ L D L 
Sbjct: 5bH KNAEE(2LEKLKNEI(2IKNl3AFEKERKLLNEGSSTIT(3EYSEKINTLEDEL 
15 bl3 



Pedant information for J>KFZphmel2._12 jl frame 2 



20 

Report for DKFZphmel2_12 jl .2 



CLENGTHJ ^05 
25 EMU! 1020b7-ai 
CpIJ S.fi5 

CHOriOLJ TREMBL : SCINTANA_1 Saccharomyces cerevisiae 

integrin analogue gene-, complete cds. le-m 

CFUNCAT3 Ofi.D? vesicular transport (golgi network-, etc-) CS. 
30 cerevisiae-, YDLDSflwJ 5e-lb 

CFUNCATJ 30-03 organization of cytoplasm ITS- cerevisiae-, 
YDLOSflwl 5e-lb 

CFUNCAT3 1 genome replication-, transcription-, recombination and 
repair CM- jannaschii-, MJ1322J le-10 
35 CFUNCATJ 0^-10 nuclear biogenesis IS. cerevisiae-, YDR35bwll 

2e-10 

CFUNCATJ 30-OH organization of cytoskeleton ES- cerevisiae-, 

YDR35bwID 2e-10 

CFUNCATJ 03-22 cell cycle control and mitosis CS- cerevisiae-, 
40 YDR35buO 2e-10 

EFUNCAT3 30-10 nuclear organization IES- cerevisiae-, YKROTSwH 
le-OT 

EFUNCATJ 11-0H dna repair (direct repair-, base excision repair 
and nucleotide excision repair) CS- cerevisiae-, YKRO^Swl le-OT 
45 CFUNCAT3 Ofi-22 cy toskeleton-dependent transport IS- cerevisiae-, 
YHR023W nYOl - myosin-1 isoformJ 4e-0^ 

EFUNCATJ 03-OM budding-, cell polarity and filament formation 

CS- cerevisiae-i YHR023W MY01 - myosin-1 isoformJ He-DT 
CFUNCAT3 03-25 cytokinesis CS • cerevisiae-, YHR023w MY01 - 
50 myosin-1 isoforml He-OT 

(EFUNCAT3 "H unclassified proteins ICS. cerevisiae-, YNLO^luO 

3e-06 

EFUNCATJ 0^-25 vacuolar and lysosomal biogenesis ITS- 
cerevisiae-, Y0R32bw3 be-0a 
55 EFUNCAT3) Dfi-lb extracellular transport CS- cerevisiae-, 

Y0R32buO be-Ofl 

IEFUNCAT3} 0^-13 biogenesis of chromosome structure flZS - 

cerevisiae-. YLROflbwJ fle-oa 
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CFUNCAT3 ^fl classification not yet clear-cut ICS. cerevisiae-, 

YJR134c3 le-07 

CFUNCATJ Ob-07 protein modification (glycolsylation-i acylation-. 
myristylation-. palmitylation-i f arnesylation and processing) 
5 CS- cerevisiae-. YKL201c3 4e-D7 

CFUNCAT3 30-05 organization of centrosome CS- cerevisiae-. 
YIL144w3 4e-0b 

CFUNCAT3 03-07 pheromone response-i mating-type determination-, 
sex-specific proteins CS- cerevisiae-i YNL07Tc3 5e-0b 
10 CFUNCAT J 03-01 cell growth ILS. cerevisiae-. YNLD7^c]l 5e-0b 

CFUNCAT3 Oa-n other intracellular-transport activities CS- 
cerevisiaei YNL07 c !c3 Se-Ob 

CFUNCATJ 0^-04 biogenesis of cytoskeleton ICS- cerevisiae-. 
YKL17^c3 be-Ob 

15 CFUNCAT3 30-02 organization of plasma membrane IS- cerevisiae-. 
YEROQflcJ fle-Ob 

CFUNCAT3 03-1T recombination and dna repair CS- cerevisiaei 

YNL250w3 le-05 

CFUNCAT3 03-13 meiosis CS - cerevisiae-. YDR2B5W3 le-05 
20 CFUNCAT3 30-13 organization of chromosome structure CS- 
cerevisiae-. YDR2a5w3 le-05 

CFUNCAT3 11-01 stress response CS- cerevisiae-. YPR141c3 2e-05 
CFUNCAT3 Ob- 10 assembly of protein complexes CS- cerevisiae-. 

YPR141c3 2e-05 

25 CFUNCAT3 Ob-01 protein folding and stabilization CS. 
cerevisiae-. YNL227c3 *!e-05 

CFUNCAT3 05-04 translation (initiation! elongation and 
termination) CS- cerevisiae-. YAL035w3 le-04 

CFUNCAT3 10. OS- 11 ! 11 ] other pheromone response activities CS- 
30 cerevisiaen YHR15ac3 le-04 

CFUNCAT3 o chaperones CM - genitalium-. MG3553 2e-04 
CFUNCAT3 03-22-01 cell cycle check point proteins CS - 

cerevisiae-. YGL0abw3 2e-04 

CFUNCAT3 03-10 sporulation and germination CS - cerevisiae-. 
35 YNL225c3 3e-04 

CFUNCAT3 r general function prediction CM- jannaschii-, 

MJ12543 4e-04 

CFUNCAT3 Oa-01 nuclear transport CS- cerevisiae-. YPL174c3 Me-04 
CFUNCAT3 04-05. 01-D1 general transcription activities CS- 
40 cerevisiae-. YMR227c TAFb? - TFIID subunitJ be-04 
CBL0CKS3 PR01002E 

CBL0CKSJ BLOllbOB Kinesin light chain repeat proteins 

CBL0CKS3 BL0032bD Tropomyosins proteins 

CSC0P3 d2tmab_ 1-105-4-1-1 Tropomyosin Crabbit 

45 (Oryctolagus cuniculus) 3e-23 

CEC3 3- b. 1-32 Myosin ATPase 4e-10 

CPIRKUJ nucleus Se-OI 

CPIRKU3 phosphotransferase 2e-07 

CPIRKtiU blocked amino end le-Ob 

50 CPIRKLO duplication 2e-07 

CPIRKUJ citrulline 3e-0a 

CPIRKUJ tandem repeat 4e-10 

CPIRKUJ heterodimer le-07 

CPIRKUJ heart 4e-0a 

55 CPIRKUJ endocytosis 7e-0a 

CPIRKUJ transmembrane protein le-14 

CPIRKUJ serine/threonine-specif ic protein kinase 2e-07 

CPIRKUJ cell wall 2e-0b 
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EPIRKliO 
EPIRKU3 
EPIRKU3 
EPIRKU3 
5 EPIRKU3 
EPIRKtO 
EPIRKliO 
EPIRKUJ 
EPIRKliO 

10 EPIRKliO 
EPIRKliO 
EPIRKIO 
EPIRKLO 
EPIRKU3 

15 EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 

20 EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKbO 
EPIRKliO 

25 EPIRKliO 
EPIRKliO 
EPIRKU3 
EPIRKliO 
EPIRKU3 

30 EPIRKU3 
EPIRKLO 
EPIRKliO 
EPIRKLO 
EPIRKLO 

35 EPIRKLO 
ESUPFAM3 
ESUPFAM3 
07 

ESUPFAfO 

40 ESUPFAI13 
ESUPFAfO 
ESUPFAI13 
ESUPFAH3 
ESUPFAfO 

45 ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 

50 ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 
ESUPFAfO 

55 ESUPFAfO 
ESUPFAfO 
ESUPFAI13 
ESUPFAfO 
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zinc finger 7e-0fl 
DNA binding 3e-0^ 
metal binding 7e-Dfi 
muscle contraction L4e-10 
brain Ee-0fc> 

acetylated amino end Ee-D7 
heterotetramer 5e-07 
actin binding He-ID 
mitosis le-Dfl 
microtubule binding le-Dfl 
• ATP He-ID 

chromosomal protein le-D7 
thick filament Te-10 
phosphoprotein le-OT 
skeletal muscle le-Ofl 
calcium binding 3e-0fi 
alternative splicing Te-ID 
DNA condensation le-07 
coiled coil le-m 
P-loop Ee-ID 
heptad repeat 5e-DT 
methylated amino acid Me-10 •* 
immunoglobulin receptor Ee-D7 
peripheral membrane protein 7e-0fl 
cardiac muscle Me-Dfl 
hydrolase Me-ID 
microtubule 5e-DT 
muscle Me-Dfi 
membrane protein Se-DT 
EF hand 3e-0fi 
cell division le-Ob 
cytoskeleton be-DT 
hair 3e-0fl 

calmodulin binding 7e-0fl 
Golgi apparatus Ee-D7 
hypothetical protein YJLD7 t 4c 5e-0^ 

unassigned Ser/Thr or Tyr-specific protein kinases Ee- 



myosin motor domain homology Ee-ID 
alpha-actinin actin-binding domain homology te-DT 
tropomyosin Ee-Ofl 
kinesin heavy chain Se-D7 
plectin Le-DT 
SAM homology le-Db 
trichohyalin 3e-0fl 

ribosomal protein SID homology be-OT 
protein kinase C zinc-binding repeat homology Se-DT 
giantin 7e-Dfi 

protein kinase homology Ee-07 

protein H-l membrane-binding domain homology Te-Dfi 
human early endosome antigen 1 7e-0fl 
myosin flYOE Ee-0b 
MS protein Se-O'! 

Mycoplasma genitalium hypothetical protein MGElo Se-DT 
myosin heavy chain Ee-ID 

conserved hypothetical PUS protein Se-DT 
centromere protein E le-Dfi 
calmodulin repeat homology 3e-0fl 
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ESUPFACO hypothetical protein MJCHIM 2e-D7 

ESUPFAM3 hypothetical protein I1J13SS 3e-Cn 

ESUPFAM3 pleckstrin repeat homology 5e-CH 

CSUPFAM3 kinesin motor domain homology le-Ofl 

ESUPFAMU ezri-n v ^fe*M 

EPR0SITE3 LEUCINE_ZIPPER 1 

IKW1 TRANSMEMBRANE 5 

IKUJ L0U_C0MPLEXITY 3.0=1 Z 

EKIO C0ILED_C0IL 1B-3H '/. 
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SEfl MDSTACLKSLLLTVSflYKAVKSEANATflLLRHLEVISGflKLTRLFTSNfllLTSECLSCLV 

SEC xxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeecceeeeeehhhhhh 
15 COILS 



MEM 

SEfl ELLEDPNISASLILSIIGLLSflLAVDIETRDCLflNTYNLNSVLAG VVCRSSHTDSVFLflC 
20 SEG xxxx- • - xxxxxxxxxxxxxx 

PRD hhhhccccchhhhhhhhchhhhhhhhhhcccccccccceeeeeeeeeeeccccccchhhh 
COILS 



MEM ...... . MMMMMMMMMMMMMMMMM 

25 

SEfl IflLLflKLTYNVKIFYSGANIDELITFLIDHIflSSEDELKMPCLGLLANLCRHNLSVflTHI 

SEG 

PRD hhhhhhhcceeeeeecccchhhhhhhhhhhhhhhhhhhccccceeeeeeeecceeeeeee 
COILS 

30 

MEM 

SEfl KTLSNVKSFYRTLITLLAHSSLTV VVFALSILSSLTLNEEVGEKLFHARNIHflTFflLIFN 
SEG 

35 PRD eeeehhhhhhhhhhhhhhcccccccceeehhhhhhchhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



MEM • MMMMMMMMMMMMMMMMM 

40 SEfl ILINGDGTLTRKYSVDLLMDLLKNPKIADYLTRYEHFSSCLHflVLGLLNGKDPDSSSKVL 
SEG 

PRD hhcccccceeeehhhhhhhhhhccccchhhhhheeeeehhhhhhhhcccccccccchhhh 
COILS 



45 MEM 

SEfl ELLLAFCSVTflLRHMLTflMMFEflSPPGSATLGSHTKCLEPTVALLRULSflPLDGSENCSV 
SEG i 

PRD hhhhhchhhhhhhhhhhhhhhhccccccccccccceeehhhhhhhhhhhcccccccchhh 
50 COILS 



MEM 

SEfl LALELFKEIFEDVIDAANCSSADRFVTLLLPTILDflLQFTEflNLDEALTRKNVKGIAKAI 
55 SEG 

PRD hhhhhhhhhhhhhhhhcccccchhhhhheeehhhhhhhhhhhhhhhhhhhhhchhhhhhh 
COILS 
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hem 

se(3 evlltlcgddtlkmhiakilttvkcttliefiflftygkidlgfgtkvadselcklaadvil 

SEG 

5 PRD hhhhhhccccchhhhhhhhhhhheeeeeeeeeeecccccccccceeehhhhhhhhhhhhh 
COILS 



mem 



10 SEfl KTLDLINKLKPLVPGMEVSFYKIL(2DPRLITPLAFALTSDNREQV<3SGLRILLEAAPLPD 

SEG 

PRD hhhhhhhhcccccccccccceeeccccccchhhhhhhccccchhhhhhhhhhhhhccccc 
COILS 



15 MEM 



SE(2 FPALVLGESIAANNAYR<2(2ETEHIPRKMPIiJ<2SSNHSFPTSIKCLTPHLKDGVPGLNIEEL 
SEG 

PRD cceeeeehhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhli 
20 COILS 



MEM 



SEfl IEKLi2SGMVVJCDr3ICDVRISDIMDVYEMKLSTLASKESRL(3DLLETKALALAi2ADRLIAl3 
25 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

MEM 

SEfl HRCURTflAETEARTLASMLREVERKNEELSVLLKAflflVESERAOSDIEHLFflHNRKLESV 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

35 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM . ... 

SE<2 AEEHEILTKSYMELLt3RNESTEKKNKDL(2ITCDSLNKl3IETVKKLNESLKE(2NEKSIA(3L 

40 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 
COILS 

• -ccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

45 SEfl IEKEEORKEVflNflLVDREHKLANLHUKTKVflEEKIKTLiaKEREDKEETIDILRKELSRTE 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

•• • • CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

50 MEM 

SEC flIRKELSIKASSLEVflKA(2LEGRLEEKESLVICLfl(3EELN<HSHMIAMIHSLSGGKINPET 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
55 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC - 

MEM 
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SEQ VNLSI 

SEG 

PRD ccccc 

COILS 

5 MEM 



Prosite for DKFZphmelE lEjl-2 

10 

PSQDQ2T 331->353 LEUCINE_ZIPPER PD0CDD0B1 

(No Pfam data available for DKFZphmel5_12 jl-2) 
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5 group: intracellular transport and trafficing 



DKFZphmel2_7gm encodes a novel T73 amino acid protein with 
similarity to the dor (deep orange) protein of drosophila 
melanogaster- 

10 

The novel protein is also similar to the vakuolar membrane 
protein pep3 of Saccharomyces cerevisiaen which is involved in 
protein sorting mechanisms. The expression prof ile is ubiquitous 
and a role in protein transport/targeting is likely* 

15 

The new protein can find application in modulation of the sorting 
of proteins into different compartments. 



20 similarity to PEEP ORANGE (Drosophila melanogaster) 
perhaps complete cds- and full length 
Sequenced by MediGenomix 

25 

Locus: unknown 

Insert length: 3^51 bp 

Poly A stretch at pos- 3flT3i polyadeny lat ion signal at pos- 357^4 

30 

1 GCCCGCGTCA CGGGGGCGGG AGTCAGCTGA GCTGCCGGGG CGAGGTTGGG 
51 ATCACCTGGC ACCGGCTGAA GGGAGCCTGT GATTTTTTTG TAGCGGGGGC 
101 GGGGAGTAAG GTGCAAGACT GCGCCAGATT CAAGGACGAG GGCTGCCCGA 
35 151 TTATCTCGCT GCATAAGGCA- AGAGCAAGAG GATCCTCAGG ATTTTAAAGA 

201 GGAGGCGACG GCTGCAGGTT CCCAGGATCT GTCAGAGGCT GGGGAGTTAC 
251 AGCTTCCATT CTGGGGCGAC GGGGACCCCG GGGGGGTAGC CCTTTTGTAA 
301 TCCCCAGGCC CCGGACAAAG AGCCCAGAGG CCGGGCACCA TGGCGTCCAT 
351 CCTGGATGAG TACGAGAACT CGCTGTCCCG CTCGGCCGTC TTGCAGCCCG 
40 M01 GCTGCCCTAG CGTGGGCATC CCCCACTCGG GGTATGT6AA TGCCCAGCTG 

i*51 GAGAAGGAAG TGCCCATCTT CACAAAGCAG CGCATTGACT TCACCCCTTC 
501 CGAGCGCATT ACCAGTCTT6 TCGTCTCCAG CAATCAGCTG TGCATGAGCC 
551 TGGGCAAGGA TACACTGCTC CGCATTGACT TGGGCAAGGC AAATGAGCCC 
b01 AACCACGTGG AGCTGGGACG TAAGGATGAC GCAAAAGTTC ACAAGATGTT 
45 L51 CCTTGACCAT ACTGGCTCTC ACCTGCTGAT TGCCCTGAGC AGCACGGAGG 

701 TCCTCTACGT GAACCGAAAT GGACAGAAGG TACGGCCACT AGCACGCTGG 
751 AAGGGGCAGC TGGTGGAGAG TGTGGGTTGG AACAAGGCAC TGGGCACGGA - 
SOI GAGCAGCACA GGCCCCATCC TGGTCGGGAC TGCCCAAGGC CACATCTTTG 
fl51 AAGCAGAGCT CTCAGCCAGC GAAGGTGGGC TTTTCGGCCC TGCTCCGGAT 
50 ^01 CTCTACTTCC GCCCATTGTA CGTGCTAAAT GAAGAAGGGG GTCCAGCACC 

T51 TGTGTGCTCC CTTGAGGCCG AGCGGGGCCC TGATGGGCGT AGCTTTGTTA 
1001 TTGCCACCAC TCGGCAGCGC CTCTTCCAGT TCATAGGCCG AGCAGCAGAG 
1051 GGGGCTGAGG CCCAGGGTTT CTCAGGGCTC TTTGCAGCTT ACACGGACCA 
1101 CCCACCCCCA TTCCGTGAGT TTCCCAGCAA CCTGGGCTAC AGTGAGTTGG 
55 1151 CCTTCTACAC CCCCAAGCTG CGCTCCGCAC CCCGGGCCTT CGCCTGGATG 
1E01 ATGGGGGATG GTGTGTTGTA TGGGGCATTG GACTGTGGGC GCCCTGACTC 
1B51 TCTGCTGAGC GAGGAGCGAG TCTGGGAGTA CCCAGAGGGG GTAGGGCCTG 
1301 GGGCCAGCCC ACCCCTAGCC ATCGTCTTGA CCCAGTTCCA CTTCCTGCTG 
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13S1 CTACTGGCAG ACCGGGTGGA GGCAGTGTGC ACACTGACCG GGCAGGTGGT 
1401 GCTGCGGGAT CACTTCCTGG AGAAATTTGG GCCGCTGAAG CACATGGTGA 
mSl AGGACTCCTC CACAGGCCAG CTGTGGGCCT ACACTGAGCG GGCTGTCTTC 
1501 CGCTACCACG TGCAACGGGA GGCCCGAGAT GTCTGGCGCA CCTATCTGGA 
5 1S51 CATGAACCGC TTCGATCTGG CCAAAGAGTA TTGTCGAGAG CGGCCCGACT 
IbOl GCCTGGACAC GGTCCTGGCC CGGGAGGCCG ATTTCTGCTT TCGCCAGCGT 
IbSl C6CTACCTGG AGAGCGCACG CTGCTATGCC CTGACCCAGA GCTACTTTGA 
1701 GGAGATT6CC CTCAAGTTCC TGGAGGCCCG ACAGGAGGAG GCTCTGGCTG 
17S1 AGTTCCTGCA GCGAAAACTG GCCAGTTTGA AGCCAGCCGA ACGTACCCAG 

10 IflOl GCCACACTGC TGACCACCTG GCTGACAGAG CTCTACCTGA GCCGGCTT6G 
IfiSl GGCTCTGCAG GGCGACCCAG AGGCCCTGAC TCTCTACCGA GAAACCAAGG 
1=101 AATGCTTTCG AACCTTCCTC AGCAGCCCCC GCCACAAA6A GTGGCTCTTT 
1=151 GCCAGCCGGG CCTCTATCCA TGAGCTGCTC GCCAGTCATG GGGACACAGA 
S001 ACACATGGTG TACTTTGCAG TGATCATGCA GGACTATGAG CGGGTGGTGG 

15 SD51 CTTACCACTG TCAGCACGAG GCCTACGAGG AGGCCCTGGC CGTGCTCGCC 
2101 CGCCACCGTG ACCCCCAGCT CTTCTACAAG TTCTCACCCA TCCTCATCCG 
2151 TCACATCCCC CGCCAGCTTG TAGATGCCTG GATTGAGATG GGCAGCCGGC 
2201 TGGATGCTCG TCAGCTCATT CCTGCCCTGG TGAACTACAG CCAGGGTGGT 
2251 GAGGTCCAGC AGGTGAGCC A GGCCATCCGC TACATGGAGT TCTGCGTGAA 

20 2301 CGTGCTGGGG GAGACTGAGC AGGCCATCCA CAACTACCTG CTGTCACTGT 
2351 ATGCCCGTGG CCGGCCGGAC TCACTACTGG CCTATCTGGA GCAGGCTGGG 
2H01 GCCAGCCCCC ACCGGGTGCA TTACGACCTC AAGTATGCGC TGCGGCTCTG 
2i*Sl CGCCGAGCAT GGCCACCACC GCGCTTGTGT CCATGTCTAC AAGGTCCTAG 
2501 AGCTGTATGA GGAGGCCGTG GACCTGGCCC TGCAGGTGGA TGTGGACCTG 

25 2551 GCCAAGCAGT GTGCAGACCT GCCTGAGGAG GATGAGGAAT TGCGCAAGAA 
2t01 GCTGTGGCTG AAGATCGCAC GGCACGTGGT GCAGGAAGAG GAAGATGTAC 
2LS1 AGACAGCCAT GGCTTGCCTG GCTAGCTGCC CCTTGCTCAA GATTGAGGAT 
2701 GTGCTGCCCT TCTTTCCTGA TTTCGTCACC ATCGACCACT TCAAGGAGGC 
2751 GATCTGCAGC TCACTTAAGG CCTACAACCA CCACATCCAG GAGCTGCAGC 

30 2601 GGGAGATGGA AGAGGCTACA GCCAGTGCCC AGCGCATCCG GCGAGACCTG 
2A51 CAGGAGCTGC GGGGCCGCTA CGGCACTGTG GAGCCCCAGG ACAAATGTGC 
2=101 CACCTGCGAC TTCCCCCTGC TCAACCGCCC TTTTTACCTC TTCCTCT6TG 
2=151 GCCATATGTT CCATGCTGAC TGCCTGCTGC AGGCTGTGCG ACCTGGCCTG 
3001 CCAGCCTACA AGCAGGCCCG GCTGGAGGAG CTGCAGAGGA AGCTGGGGGC 

35 3051 TGCTCCACCC CCAGCCAAGG GCTCTGCCCG GGCCAAGGAG GCCGAGGGTG 
3101 GGGCTGCCAC GGCAGGGCCC AGCCGGGAAC AGCTCAAGGC TGACCTGGAT 
3151 GAGTTGGTGG CCGCTGAGTG TGTGTACTGT GGGGAGCTGA TGATCCGCTC 
3201 TATCGACCGG CCGTTCATCG ACCCCCAGCG CTACGAGGAG GAGCAGCTCA 
3251 GTTGGCTGTA GGAGGGTGTC ACCTTTGATG GGGGTGGGCA ATGGGGAGCA 

40 3301 6TGGCTTGAA CCCACTTGAG AAGGCTGCCT CCTAGGCTCT GCTCAGTCAT 
3351 CTTGCAATTG CCACACTGTG ACCACGTTGA CGGGAGTAGA GTAGCGCTGT 
3M01 TGGCCAGGAG GTGTCA6GTG TGAGTGTATT CTGCCAGCTT TTCATGCTGT 
3i»Sl TCTTCAGAGC TGCAGTTATG CCAGACCATC AGCCTGCCTC CCAGTAGAGG 
3501 CCCTTCACCT GGAGAAGTCA GAAATCTGAC CCAATTCCAC CCCCTGCCTC 

45 3551 TAGCACCTCT TCTGTCCCTG TCATTCCCCA CACACGTCCT GTTCACCTCG 
3b01 AGAGAGAGAG AGAGAGAGCA CCTTTCTTCC GTCTGTTCAC TCTGCGGCCT 
3b51 CT6GAATCCC AGCTCTTCTC TCTCAGAAGA AGCCTTCTCT TCCTCCTGCC 
3701 TGTAGGTGTC CCAGAAGTGA GAAGGCAGCC TTCGAAGTCC TGGGCATTGG 
3751 GTGAGAAAGT GAT6CTAGTT GGGGCATGCT TTTGTGCACA CTCTCTGGGG 

50 3501 CTCCAGTGTG AAGGGTGCCC TGGGGCTGAG GGCCTTGTGG AGGATGGTCG 
3A51 GTGGTGGTGA TGGAGGTGGA GAGCATTAAA CTGTCTGCAC TGCAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AAAAAA AAAA AAAGAAAAAA AAAAAAAAAA 
3=151 A 

55 

BLAST Results 
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Medline entries 

5 

17218037: 

Shestopal SA, Makunin IV, Belyaeva ES, Ashburner 11, Zhimulev 
IF-^Mol 

10 Gen Genet 1=1=17 Feb 2Di253 ( 5 ) : fc,M2-fl 

Robinson JS, Graham TR-i Emr SD • i A putative zinc finger protein, 
Saccharomyces cerevisiae 
15 Vpslflp, affects late Golgi functions required for 
vacuolar protein sorting and efficient alpha-factor 
prohormone maturation. Mol Cell Biol 1=1=11 Decill(lS) :5fll3-21 



=I20M=1305: 

20 Preston RA, Manolson MF, Becherer K, Ueidenhammer E, Kirkpatrick 
D, 

Wright R, 

Jones EU-, Isolation and characterization of PEP3, a gene 
required 

25 for vacuolar biogenesis in Saccharomyces cerevisiae. Mol Cell 
Biol nn 

Dec;ll(12> :Sfl01-12 



30 



Peptide information for frame 1 



35 0RF from 3M0 bp to 3256 bp, peptide length: =173 
Category: similarity to known protein 
Classification: Cellular transport and traffic 



1 MASILDEYEN 

40 SI FTPSERITSL 

1D1 HKMFLDHTGS 
151 LGTESSTGPI 
2D1 GPAPVCSLEA 
251 YTDHPPPFRE 

45 3D1 RPDSLLSEER 

351 G6JVVLRDHFL 
MD1 TYLDMNRFDL 
1451 SYFEEIALKF 
5D1 SRLGALGGDP 

50 551 6DTEHMVYFA 
LD1 ILIRHIPRtJL 
b51 FCVNVLGETE 
701 LRLCAEHGHH 
751 LRKKLWLKIA 

55 flDl FKEAICSSLK 

651 DKCATCDFPL 
=101 KLGAAPPPAK 
=151 MIRSIDRPFI 



SLSRSAVL(3P 
VVSSNflLCMS 
HLLIALSSTE 
LVGTAflGHIF 
ERGPDGRSFV 
FPSNLGYSEL 
VWEYPEGVGP 
EKFGPLKHMV 
AKEYCRERPD 
LEARCEEALA 
EALTLYRETK 
VIM0DYERVV 
VDAUIEMGSR 
(3AIHNYLLSL 
RACVHVYKVL 
RHVVfiEEEDV 
AYNHHIOELfl 
LNRPFYLFLC 
GSARAKEAEG 
DPflRYEEEflL 



GCPSVGIPHS 
LGKDTLLRID 
VLYVNRNGCJK 
EAELSASEGG 
IATTR<3RLF<2 
AFYTPKLRSA 
GASPPLAIVL 
KDSSTGCLUA 
CLDTVLAREA 
EFLflRKLASL 
ECFRTFLSSP 
AYHCflHEAYE 
LDARtJLIPAL 
YARGRPDSLL 
ELYEEAVDLA 
(JTAMACLASC 
REMEEATASA 
GHMFHADCLL 
GAATAGPSRE 
SUL 



GYVNAfiiLEKE 
LGKANEPNHV 
VRPLARUKGQ 
LFGPAPDLYF 
FI6RAAEGAE 
PRAFAWMMGD 
TflFHFLLLLA 
YTERAVFRYH 
DFCFRCRRYL 
KPAERTflATL 
RHKEULFASR 
EALAVLARHR 
VNYS(3GGEVC3 
AYLEUAGASP 
Lt2VDVDLAK<2 
PLLKIEDVLP 
<2RIRRDL<2EL 
I2AVRPGLPAY 
(2LKADLDELV 



VPIFTKflRID 
ELGRKDDAKV 
LVES VGUNKA 
RPLYVLNEEG 
AOGFSGLFAA 
GVLYGALDCG 
DRVEAVCTLT 
VflREARDVUR 
ESARCYALTfl 
LTTULTELYL 
ASIHELLASH 
DPtJLFYKFSP 
(JVSflAIRYME 
HRVHYDLKYA 
CADLPEEDEE 
FFPDFVTIDH 
RGRYGTVEPfl 
KOARLEELtfR 
AAECVYCGEL 
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BLASTP hits 

5 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmel2_?gm -, frame 1 
10 SldISSPR0T:D0R_DR0t1E DEEP ORANGE PROTEIN--. N = 1, Score = 1271-, P 
B-Me-130 

PIR : AM1143 vacuolar membrane protein PEP3 - yeast (Saccharomyces 
15 cerevisiae) i N - 3-. Score = 2bb-. P = S-le-H? 



20 



25 



>SUIISSPR0T:D0R_DR0I1E DEEP ORANGE PROTEIN- 
Length = 1-.GD2 

HSPs: 

Score = 1271 (111-1 bits)-. Expect = 2-Me-130-. P = 2-Me-130 
Identities = SOS/fl^ (352) •» Positives = 1b3/fi»»? (51*) 



Query : 13D 

KVRPLARUKGflLVESVGUNICALGTESSTGPILVGTAfiGHIFEAELSASEGGLFGPAPDLY Ifll 
KVR + ++K + +V +N G ESSTGPIL+GT++G IFE EL+ + G 

+ 

30 Sbjct: 1SS KVRRIEKFKDHEITAVAFNPYHGNESSTGPILLGTSRGLIFETELNPAADG- 
■ — HV<2 2Dfl 

tJuery: 110 FRPLYVLNEEGGPA-PVCSLEAERGPDG- 
RSFVIATTR<2RLFfiFIGRAAEGAEA<2GFS6L 217 
35 + LY L G P P+ L+ R P+ R ++ T+ + ++ F + 

AE + + 

Sbjct: £01 RKflLYDLGL-GRPKYPITGLKLLRVPNSSRYIIVVTSPECIYTF-- 
(2ETLKAEERSL(3AI 2b5 

40 Query: 21fl FAAYTD — 

HPPPFREFPSNLGYSELAFYTPKLRSAPRAFAWIW6DGVLYGAL — DCGRPD 303 

FA Y P E ++L +S+L F+ P P+ +AW+ G+G+ G L 

+ 

Sbjct: Ebb 

45 FAGYVSGVtfEPHCEERKTDLTFS(3LRFFAPPNSKYPK<2li)AliJLCGEGIRVGELSIEANSAA 325 

i3uery: 30»4 SLLSEERV UEYPEGVGPGA 

SPPLAIVLTdFHFLLLLADRVEAVCTLTGflVVLRD 357 

+L+ + +E + G + P A VLT++H +LL AD V A+C L 

50 + V ++ 

Sbjct: 32b 

TLIGNTLINLDFEKTMHLSYGERRLNTPKAFVLTEYHAVLLYADHVRAICLLNtJECJVYlJE 365 

fluery: 35fi HFLE- 
55 KFGPLKHMVKDSSTGflLUAYTERAVFRYHVflREARDVURTYLDriNRFDLAKEYCR Mlb 

F E + G + +D TG ++ YT + VF V RE R+VUR YLD 

+++LA + 
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Sbjct: 3&b 

AFDEARVGKPLSIERDELTGSIYVYTVKTVFNLRVTREERNVblRIYLDKGGYEL ATAHAA MMS 
fluery: Ml? 

5 ERPDCLDTVLAREADFCFR(2RRYLESARCYALT<2SYFEEIALKFLEAR(2EEALAEFL<2RK H7b 

E P+ L VL + AD F Y +A YA T FEE+ LKF+ + 

+ +++++ 
Sbjct: MML 

EDPEHL(JLVLC<2RADAAFAI>6SYflVAAI>YYAETDKSFEEVCLKFI1VLPDKRPIINYVKKR SOS 

10 

CJuery: 177 LASL-- -KPAERXXXXXXXXXXXXXXXSRLGALfl 

GDPEALTLYRETKEC-FRTFLSS S5T 

L+ + KP E L L P+ +R + + 

+ F+ 
15 Sbjct: 5Db 

LSRVTTKPMETDELDEIXnNIIKALVIULIDLYLIfllNIIPDKDEEURSSIJOTEYDEFIinE 5b5 
Query- 530 

PRHKEULFASRASIHELLASHGl>TEHI1VYFAVH1l31>YERVVAYHC(2HEAYEEALAVLARH SSI 
20 +R ++ +L + A H D +11 FA + + DY+ VVA + E Y 

EAL L 

Sbjct: Sbb 

AHVLSCTRfiNRETVROLIAEHADPRNHAOFAIAIGDYDEVVAtlflLKAECYAEALflTLINfl bSS 
25 fluery: STD 

R»PflLFYKFSPILIRHIPRC!LVl>AUIEMGSRLl>AR(3LIPALVNYS(3GGEVi3(JVSflAIRYri bM"> 
R+P+LFYK++P LI +P+ VDA + GSRL+ +L+P L+ + E ++ 

+a RY+ 

Sbjct: b2b RNPELFYKYAPELITRLPKPTVDALMAflGSRLEVEKLVPTLI- 
30 If1ENREt3RE£2T<2 — RYL bA2 

fluery: b50 

EFCVNVLGETEflAIHNYLLSLYARGRPDSLLAYLECAGASPHRVHYDLKYALRLCAEHGH 70T 
EF + L T AIHN+LL LYA P L+ YLE G VHYD+ YA 

35 ++C + 

Sbjct: bfl3 

EFAIYKLNTTNDAIHNFLLHLYAEHEPKLLMKYLEIflGRDESLVHYDIYYAHKVCTDLDV 7MB 
fiuery: 710 

40 HRACVHVYKVLELYEEAVDLALfiVDVDLAKCCADLPEEDEELRKKLULKIARHVViJEEE]) 7bT 

A V + +L ■ + AVDLAL D+ LAK+ A P D ++R+KLUL+IA 

H ++ D 

Sbjct: ?i»3 KEARVFLECflLRKbllSAVDLALTFDIIKLAKETASRPS- 
DSKIRRKLhlLRIAYHDIKGTND 601 

45 

tiuery: 77D 

V(JTAMACLASCPLLKIEI>VLPFFPDFVTIDHFKEAICSSLKAYNHHI(3ELcJREf1EEATAS flHI 
V+ A+ L C LL+IED+LPFF DF ID+FKEAIC +L+ YN 

IflELdREM E T 
50 Sbjct: S02 

VKKALNLLKECDLLRIEI>LLPFFADFEKIDNFICEAIC])ALRDYNt2RI(3ELl2REriAETTEi3 flbl 
Query' A3D 

AflRIRRBLiUELRGRYGTVEPCDKCATCDFPLLNRPFYLFLCGHIIFHADCLLflAVRPGLPA fifll 
55 R +L<3+LR TVE <2D C C+ LL +PF+ + F+CGH FH+DCL + 

V P L 

Sbjct: flb5 

TDRATAELfltSLRflHSLTVESODTCEICEfinLLVKPFFIFICGHKFHSDCLEKHVVPLLTK 151 
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fluery: fiTD 

YKdARLEELfiRKLGAAPPPXXXXXXXXXXXXXXXXXXPSREdLKADLDELVAAECVYCGE THT 
+ RL L +++L A R LIC 

5 ++++++AA+C++CG 
Sbjct: ^22 

EflCRRLGTLKtJl2LEAEV(2T(2AflP(2SGALSK<2(2AriELt2RKRAALKTEIEI>lLAAI>CLFCG- IflD 

fluery: "ISO LniRSII>RPFI]>P<2RYEEE<2LSti) 172 
10 L+I +ID+PF+D +E+ + Id 

Sbjct: Ifll LLISTIDflPFVDP— UEflVNVEW 1001 

Score = 2bfl (MO-2 bits)-. Expect = S-be-lT-. P = 3.be-M 
Identities = 11/2fil (32*)-. Positives = mb/2fll (51*) 

15 

fluery: 3b OLEKEVPIFTKdRIDF-TPSE RITSLVVSSNflLCPISLG 

KDTLLRIDLGKANEPN flfl 

+ ++E IF++ ++ PS + L VS N L LG + TLLR 

L +A P 
20 Sbjct: 37 

ETDEEDEIFSRHKHVLRVPSNCTGDLMHLAVSRNULVCLLGTPERTTLLRFFLPRAIPPG lb 

fluery: ST HVELGRK DDAKVHKMFLDHTGSHLLIAL SST EVLYVN-- 

RNGQ KV 131 

25 L + K+ +P1FLD TG H++IAL S+T + LY++ + 

Q KV 
Sbjct: 17 

EAVLEKYLSGSGYKITRMFLDPTGHHIIIALVPKSATAGVSPDFLYIHCLESPflAflQLKV 15b 
30 fluery: 132 

RPLARUKG<2LVESVGWNKALGTESSTGPILVGTA<3GHIFEAELSASEG6LFGPAPDLYFR Ml 
R + ++K + +V +N G ESSTGPIL+GT++G IFE EL+ + G 

+ + 

Sbjct: 157 RRIEKFKDHEITAVAFNPYHGNESSTGPILLGTSRGLIFETELNPAADG 

35 HV(3RK 210 

(2uery: 112 PLYVLNEEGGPA-PVCSLEAERGPDG- 
RSFVIATTRiJRLFCFIGRAAEGAEAflGFSGLFA 2 l 4 e J 

LY L G P P+ L+ R P+ R ++ T+ + ++ F + AE 

40 + +FA 

Sbjct: 211 flLYDLGL-GRPKYPITGLKLLRVPNSSRYIIVVTSPECIYTF-- 
<3ETLKAEERSL<3AIFA 2b7 

(Juery: 250 AYTD — HPPPFREFPSNLGYSELAFYTPKLRSAPRAFAUMflGDGVLYGAL 
45 217 

Y P E ++L +S+L F+ P P+ +AU+ G+G+ G L 

Sbjct: 2bfl GYVSGVf2EPHCEERKTDLTFS<2LRFFAPPNSKYPK<2lilAWLCGEGIRVGEL 
317 

50 

Pedant information for DKFZphmel2_7gm -i frame 1 



Report for DKFZphmel2_7glM -1 

55 

CLENGTHJ "173 

010 llDlflb-OI 
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EpIJ 5-75 

EHOMOLJ SblISSPR0T:D0R_DR0t1E DEEP ORANGE PROTEIN . le-ms 

EFUNCAT3 30- £5 vacuolar and lysosomal organization ES- 
cerevisiae-i YLRlMflwH 5e-Ml 

EFUNCATH Db-DM protein targeting-i sorting and translocation 

ES • cerevisiae-i YLR14flw3 Se-Ml 
EFUNCATH Dfl-07 vesicular transport (golgi network-, etc.) ES. 
cerevisiae-i YLRlMBwJ 5e-41 
EBL0CKS3 BLDDlOfcjF Galactokinase proteins 
EBL0CKS3 PR01014B 
EBL0CKS3 BPD330bB 
EBLOCKSJ PFOObODB 

EPIRKU3 yeast vacuole le-31 

EPIRKIO transmembrane protein le-31 

EKbO Alpha_Beta 

EKbO LOU_COriPLEXITY 3.31 2 

EKtiD C0ILED_C0IL 4- A3 V. 



20 SE<2 flASILDEYENSLSRSAVLtSPGCPSVGIPHSGYVNACJLEKEvPIFTKflRIDFTPSERITSL 
SEG 

PRD ccceeeccccccceeeeecccccceeeecccchhhhhhhhhhhhhnhhhhcccccceeee 
COILS 



25 

SEC VVSSNflLCMSLGKDTLLRIDLGKANEPNHVELGRKDDAKVHKFIFLDHTGSHLLIALSSTE 
SEG 

PRD eeccceeeeecccccceeeccccccccceeeeehhhhhhhheeecccccceeeeeeccce 
COILS 

30 

SEO VLYVNRNG<2KVRPLARbJKG(3LVESVGli)NKALGTESSTGPILVGTA<2GHIFEAELSASEGG 
SEG 

PRD eeeeecccccchhhhhcccceeeeeecccccccccccceeeeecccchhhhhhhhhhccc 
35 COILS 



10 



SE<3 LFGPAPDLYFRPLYVLNEEGGPAPVCSLEAERGPDGRSFVIATTR<3RLF(2FIGRAAEGAE 
SEG 

40 PRD ccccccccccceeeeecccccccceeecccccccccceeeeeehhhhhhhhhhcchhhhh 
COILS 

SEC3 AC3GFSGLFAAYTDHPPPFREFPSNLGYSEL AFYTPKLRSAPRAFAtdWIGDGVLYGALDCG 
45 SEG 

PRD hhhchhhhhhhhccccccccccccccccceeeecccccchhhhhhhhcccceeeeeeccc 
COILS 

50 SEC RPDSLLSEERVWEYPEGVGPGASPPLAIVLT<2FHFLLLLADRVEAVCTLTG(3VVLRDHFL 
SEG 

PRD cccccchhhhhhccccccccccccchhhhhhhhhhhhhhhhheeeecccchhhhhhhhhh 
COILS 

55 

SE(2 EKFGPLKHHVKDSSTG<2LUA YTERAVFRYHV(2REARDVURTYLDHNRFDLAKEYCRERPD 
SEG 

PRD hcccccccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhccc 
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SE<2 CLDTVLAREADFCFR<2RRYLESARCYALT<2SYFEEIALKFLEAR<2EEALAEFL(2RKLASL 

5 SEG 

PRD cchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 
COILS 



10 SE<2 KPAERTflATLLTTldLTELYLSRLGALlSGDPEALTLYRETKECFRTFLSSPRHKEIiJLFASR 

SEG xxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



15 

SEC ASIHELLASHGDTEHI"IVYFAVII1<2DYERVVAYHC<3HEAYEEALAVLARHRDP<2LFYKFSP 

SEG r 

PRD hhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhcce 
COILS 

20 

SEfi ILIRHIPRflLVDAWIEPIGSRLDARflLIPALVNYSflGGEVflflVSflAIRYriEFCVNVLGETE 

SEG 

PRD eeeeccccchhhhhhhhccccccccccchhhhhccccchhhhhhhhhhhhhhhhccccch 
25 COILS 



SEfi (JAIHNYLLSLYARGRPDSLLAYLEflAGASPHRVHYDLKYALRLCAEHGHHRACVHVYKVL . 

SEG < 

30 PRD hhhhhhhhhhhhhcccchhhhhhhhcccccccccchhhhhhhhhhhhcccccceeehhhh 
COILS 



SE<2 ELYEEAVDLALflVDVDLAKflCADLPEEDEELRKKLWLKIARHVVCEEEDVflTAMACLASC 

35 SEG 

PRD hhhhhhhhhhhhhchhhhhhhhhccccchhhhhhhhhhhhhhhhhhcchhhhhhhhhhhc 
COILS 



40 SE<3 PLLKIEDVLPFFPDFVTIDHFKEAICSSLKAYNHHIOELOREIIEEATASAflRIRRDLflEL 

SEG -. .... 

PRD ccchhhhhhcccccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

45 

SEfl RGRYGTVEPflDKCATCDFPLLNRPFYLFLCGHI1FHADCLL(3AVRPGLPAYK(3ARLEELflR 

SEG • 

PRD hhhheeeeccccccccccccccceeeeeeeccchhhhhhhhhhhccchhhhhhhhhhhhh 
COILS 

50 CCCCCCCC 

SEfi KLGAAPPPAKGSARAKEAEGGAATAGPSREfiLKADLDELVAAECVYCGELMIRSIDRPFI 

SEG xxxxxxxxxxxxxxxxxx. 

PRD hhhhhcchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhccccceeeecccccc 
55 COILS 



SE<2 DPflRYEEEflLSUL 
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SEG 

PRD chhhhhhhhhccc 
COILS 

(No Prosite data available for DKFZphmelS_7gm -1) 
(No Pfam data available for DKFZphmelE_7gm.l) 
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5 group: melanoma derived 

»KFZphmel2_7kl e J encodes a novel 23M amino acid protein without 
similarity to known proteins. 

10 Transcpripts can be found in almost any tissuei but are most 
abundant in kidney and retina- 

No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 



15 The new protein can find application in studying the expression 
profile of melanoma-specific genes. 



unknown protein 

20 

first ATG in frame 1 
Sequenced by MediGenomix 
25 Locus: /map="3" 

Insert length: H3flb bp 

Poly A stretch at pos. 2343, polyadenylation signal at pos. 2323 

30 

1 GGCAAAAGTC CAGGAATTAT CTTCATCCCT GGCTATCTTT CTTATATGAA 
51 TGGTACAAAA GCGTTGGCGA TTGAGGAGTT TTGCAAATCT CTAGGTCACG 
101 CCT6CATAAG GTTTGATTAC TCAGGAGTTG GAAGTTCAGA TGGTAACTCA 
151 GAGGAAAGCA CACTGGGGAA ATGGAGAAAA GATGTTCTTT CTATAATTGA 
35 201 TGACTTAGCT GATGGGCCAC AGATTCTTGT TGGATCTAGC CTTGGAGGGT 

H51 GGCTTATGCT TCATGCTGCA ATTGCACGAC CAGAGAAGGT TGTGGCTCTT 
301 ATTGGTGTAG CTACAGCTGC AGATACCTTA GTGACAAAGT TTAATCAGGT 
351 TCCTGTTGAG CTAAAAAAGG AAGTAGAGAT GAAAGGTGTG TGGAGCATGC 
401 CATCAAAATA CTCTGAAGAA GGAGTTTATA ACGTTCAGTA CAGTTTCATT 
40 451 AAAGAAGCTG AACATCACTG CTTGTTACAT AGCCCAATTC CTGTGAACTG 
501 CCCCATAAGA TTGCTCCATG GCATGAAGGA TGACATTGTA CCTTGGCATA 
551 CATCAATGCA GGTTGCCGAT CGAGTACTCA GCACAGATGT GGATGTCATC 
bOl CTCCGAAAAC ACAGTGATCA CCGAATGAGG 6AAAAAGCAG ACATTCAACT 
b51 TCTTGTTTAC ACTATTGATG ACTTAATTGA TAAGCTCTCA ACTATAGTTA 
45 701 ACTAGTATCA CAT6TTTAGT TGGTATGTAA ACTAATGTAT CCAGAAGATT 

751 GGAAGAGGGA TAAGAAATGA AAGATCCTGA TACTTTAGGT TTTTCCCTTT 
flOl CCTCTATTTT GTAAATATAA GATGAGTATT ATTTAATGAT GTATTTGCAT 
651 AAGTAATGCA AATTGTGAAG AAGGACCAGC TGCTGTTTAG AAAATTTTCT 
=101 CCTTCCTTCT GTCCTTGATT TTTTTTCATT AAAGTATTTC CTTTTTTTAA 
50 151 TTCAAGAAAA GTTTACCTTT CTTATGCTTA TGTTAGCTAT GCCAGCTCTT 
1001 AATTGCATCC TTTTCTAATT AGGATTATTA ATAAAGCGTG AATATTTTGT 
1051 TTTTTATTAT AGACAGAAAT TT6TAACATT ACTTCTGATT TGAAAATGCA 
1101 ATTCACAAAA TATAGGGAAA TTTTTATTGA AGTAAATTTG AAATGATGGA 
1151 GAAATTTCAG AAGCATAATA AAGTTCACAA TAAGGATAAT ACTTTATATA 
55 1501 ATGTATAAAG TATATATAAT ATAATATATA TGTTATATAA ACTGCACATT 
1S51 ATATTCAAAC TTAAAATTGA GCTTTTTTTT TAAAGGCCCA AAATTGTACA 
1301 GTGATACAAG GAGCTATTTC TAA AATTTGG CTTATGTATA ATATATTTAA 
1351 ATGGGGAATT TCATCTAAAA CAATGATGTA GTATTTTTAA TATTCTGATT 



-262- 



WO 01/98454 

1M01 GGTAAAATTA 
mSl ATTCATTATT 
1501 CACCTAAATC 
1551 GTATAAAGGT 
5 IbQl TTGAGCCTTG 
1L51 GGACTTTATT 
17D1 ACAACATGAC 
1751 ACTTATTCGA 
IflOl AAGTACAATT 

10 1B51 GAGATAAAAT 
1=101 CTTAACCTTA 
1=151 GAACTTGGAG 
2001 CTTACTATAT 
2051 TTCCTTCTGG 

15 2101 TGTGACCAGC 
2151 TTCTGGTTGA 
2201 ATTAAGCTCA 
2251 CACTTAGTTT 
2301 TTTGTTTTGT 

20 2351 AAAAAAAAAA 



n 



AAGAGGAAAT 
TTATTAATAT 
AGAAGACGTT 
TATTTTTTTT 
ATATTATTTA 
TAGCTTGATT 
TCATATATAT 
CTCATTAATG 
CAAGAAACTG 
TGCTGTGCAG 
GAGACCTGCT 
TTCAGGGGGC 
CTAAACTGTA 
GAGTCTGGAA 
TCCCAGTAAA 
CAGTGTTTCA 
TCCTCTGTGA 
CCCCTGACTT 
ATCCTTTCAC 
AAAAAAAAAT 



TAATCTTT AT 
TGCCCTAAGT 
CTAAAGTCAG 
CTTTCCTAAA 
GTTAATGTTT 
AGGTTATTAT 
ACATGTGTAT 
AGGAAACCAG 
AGTATTTATG 
AAAAAAGTGT 
TTACAAGGTT 
TTCCACCATT 
AAACAATATA 
TTTTGGTATG 
AACCCCAGGC 
CAAGTGCTGT 
TTCCACTGGC 
CACCCCATGT 
TGTAATAAAT 
AAAAAAAAAA 



ATATTATTTC 
ACAACTAGGC 
TAAGAAAGTG 
TAACTAAAGT 
TTTATTAATT 
CTGTCAA ACC 
AAGATGAGCA 
CAGATAGTAA 
GGCATTGAAG 
TAATGAAGCC 
GGCCCTTGAT 
CCCAGAACTG 
GTTTCTCCTG 
TGCCAGGCAG 
ACTCAGTCTC 
TACAACTGGT 
GGAGGATTCT 
GTCTTTTTTC 
CATGGCCGTG 
AAAAAA 
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TTGCAGAAAC 
AAGTGATTGC 
TGAAATGCTA 
GAGGTGTAGA 
AATTTTGGCT 
TTTTAAGTTG 
TGTGTCGAAG 
ACCTGGTTCA 
AAAAAATGTT 
GACCTGACTA 
TGGCATCTGG 
ATCAAAGTAG 
AACACCTGCT 
AGACTACCTT 
TAACAAGCTT 
TGCTGGGAGA 
TGGAAGCTTG 
CTTTGCTGAT 
AGCAGAAAAA 



BLAST Results 



25 



No BLAST result 



Medline entries 



30 



No Medline entry 



35 



Peptide information for frame 1 



ORF from Mb bp to 702 bp^ peptide length: 21^ 
40 Category: similarity to unknown protein 
Classification: unclassified 



1 MNGTKALAIE EFCKSLGHAC IRFDYSGVGS SDGNSEESTL GKWRKDVLSI 

51 IDDL ADGPdl LVGSSLGGUL MLHAAIARPE KVVALIGVAT AADTLVTKFN 

45 101 (3LPVELKKEV EMKGVWSMPS KYSEEGVYNV (3YSFIKEAEH HCLLHSPIPV 

151 NCPIRLLHGM KDDIVPUHTS MtfVADRVLST DVDVILRKHS DHRMREKADI 
201 GLLVYTIDDL IDKLSTIVN 



50 

BLASTP hits 

No BLASTP hits available 
55 Alert BLASTP hits for DKFZphmel2_?kn-i frame 1 

No Alert BLASTP hits found 
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Pedant 



information for DKFZphmelE. 



PCT/IBOl/02050 
tjkn, frame 1 



Report for DKFZphmel2_7kn - 1 

5 

CLENGTHD) 

EriliO 2M3CH-lfl 
CpIJ 

10 CHOnOLJ PIRrATlb^l hypothetical protein RP3M3 - Rickettsia 

prowazekii 3e-21 
EBL0CKS3 BPDM3S2K 
CBLOCKSH PR0Dfi2fiE 
EKIill Alpha_Beta 

15 

SE<2 fINGTKALAIEEFCKSLGHACIRFDYSGVGSSDGNSEESTLGKURKCDVLSIIDDLADGPfll 
PRD ccchhhhhhhhhhhhccceeeeeeccccccccccccccccchhhhhhhhhhhhhccccee 

20 SEtf LVCSSLGGULnLHAAIARPEKVVALI6VATAAI>TLVTKFN(3LPVELKICEVEf1KGVlJSnPS 
PRD eeecccchhhhhhhhhhccceeeeeeeeeehhhhhhcccccchhhhhhhhhhhheeeccc 



25 



40 



55 



SEt3 KYSEEGVYNV(3YSFIKEAEHHCLLHSPIPVNCPIRLLH6HKDDIVPli)HTSri(3VADRVLST 

PRD ccccccceeeehhhhhhhhhhhhhhhccccccceeecccccccccccchhhhhhhhhhhh 

SEt2 DVDVILRKHSDHRHREKADItfLLVYTIDDLIDKLSTIVN 

PRD hheeeeecccccbhhhhhhheeeeehhhhhhhhcccccc 



30 (No Prosite data available for DKFZphmelB_7kn . 1) 
(No Pfam data available for DKFZphmelS^Tkn - 1 ) 

35 DKFZphtes3_10ilfe 



group: nucleic acid management 

DKFZphtes3_JLQilb encodes a novel amino acid protein with 

similarity to human ZKl- 



The ZK1 gene is one of early response genes by exposure to 
45 ionizing radiationn and plays a role in radiation-induced 

apoptotic cell death on hematopoietic cells- The novel protein 
contains Ifl zinc finger domains-i a RGD cell attachment and a ATP 
GTP A domain- 

50 The new protein can find application in diagnosis/therapy in 

leukemia predisposition/disease in the modulation of DNA repair- 



similarity to ZK1 (Homo sapiens)-i complete cds. 
Sequenced by c2iagen 
Locus: unknown 
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Insert length: 2 
Poly A stretch a 

5 

1 CGGAAATGGA 
SI GTTGGTAACC 
101 GCTTCTGTCG 
151 TGTAGAGAGG 
10 201 CTTTGAGGAT 
H51 ATATTTCCCA 
301 AACCTGACCT 
351 GTACCAAAAC 
M01 ATGAAATTAA 
15 M51 GATGACAGAC 

501 ATGTGACAGC 
551 ATATGAGCAT 
bDl TATGGACCAA 
b51 CAGGTATCGC 
20 701 AACCCTATGC 
751 ATTCGAAGAC 
fiDl ATTTTGTGGG 
A51 GAACTCACAC 
T01 TTTACTTATT 
25 "151 GAAGCCCTAT 

1001 CCTATCATAG 
1051 AAAGAATGTG 
1101 AAGGACCCAC 
1151 GCTTATCCTA 
30 1201 GAAAGACCTT 
1251 GTCATTTCA A 
1301 GCAAGCA ATG 
1351 GAAAGGATTC 
1M01 AGCCTTCAGA 
35 1M51 GAGAGAAACC 
1501 TCACACCTTC 
1551 ATGTAAGGAA 
IbOl ATGAAAGGAC 
1L51 AAATGTAGTA 
40 1701 ACATGAAAAA 
1751 GTAAAGCCTT 
IflOl ACTGGAGAGA 
1A51 TGCCTCACAC 
nOl ATGAGTGTAA 
45 1^51 AAGCATGGTA 
2001 TGGGAAAGCC 
2051 ACACTGGAGA 
2101 AAATTCTCTT 
2151 CTATGAATGT 
50 2201 AAATACATGC 
2251 TGCGGAAAAG 
2301 TCATATGGGA 
2351 GCTAGCCTGG 
2M01 CACTATGAAT 
55 2M51 TCGATATCAT 
2501 AGTGTGGGAA 
2551 ACTCACACGG 
2b01 CATGAAAGGA 



M bp 

pos- 2fiLl-» polyadeny 



GGGGGTCGCT TTCCTCACCT 
GGTCAGACCA GCCCGAGAGG 
CTCTGTCGCC TGCGCTATGC 
ACCCCGGTAC ATCTGAAAGC 
GTGGCTGTGA ACTTCACCCA 
GAAGAATCTC TTCAGGGAAG 
CTATAGGAAA AAAATGGAGT 
CCCAGAAGAA GCTTCAGGAG 
AGAAGACAGT CATTGTGGAG 
TGAACTTCCA GGAGAAGAAA 
TTTGTGTGTG CAGAAGTTGG 
CAGAGGTGAC ACTGGACACA 
AGCCATATAA GTGTCAACAA 
CCATCCATTA GAACACAAGA 
TTGTAAAGTC TGTGGAAAAA 
ACATGGTAAT GCACAGTGGG 
AAAGCCTTCC ATTCTTTCAG 
TGGAGAGAAA CCATATGAAT 
CTGCTACCCT TCAAATACAT 
GAATGTAGCA AATGTGATAA 
ACATGAAAGA AGTCACATGG 
GAAAAGCATT TGCATATACC 
TCTGGGAAAA AACCGTATGA 
TCTTATAAGT TTTCAAACAC 
ATAAATGTAA GATATGTGGG 
ACACATGAAA AAACTCACAC 
TGGT AAAGCC TTCAATCTTT 
ACACTGGAGA GAAACCCTAT 
TCTGCCTCAC AGCTTCGAGT 
CTATGAATGT AAGGAATGTG 
GAGTGCATGG TAGGACTCAT 
TGTGGGAAAG CCTTCAGATA 
AGAAAAACAC ATAAGAATGC 
TATGTGAGAA AGGCTTTTAT 
ACTCACACTG GAGAGAAACC 
CAGATGTTGC AATTCCCTTC 
AACCCTATGA GTGTAAGCAA 
CTTCGAATGC ATGAAAGGAC 
GCAATGTGGG AAAGCCTTCA 
GGACTCACAC TGGAGAGAAA 
TTCAGATCTG CCTCAAACCT 
GAAACCCTAT GAATGTAAGG 
CTTTTCAAAT ACATGAAAGG 
AAGCATTGTG GGAATGGATT 
AAGAACACAC ATTGGAGAGA 
CATTCAATTA TTTTTCTTCC 
GAGAAGCCAT ATGAATGTAA 
TTCCTTTTAT GGACATGAAT 
GCAAGCAATG TGGCAAAACT 
GAAAGGACTC ACACTGGGGA 
AGCCTTCATT CCTTTTACTT 
GAGAGAAACC CTATGAGTGT 
CTTACACTGG AGTGAAACCC 



ation signal at pos- 2635 



TCCTCGCTGC GCGGGCGGCG 
GACCTGGTGC CTGTACCCAG 
CCTGCTGTAG TCACAGGAGC 
CGGGAAATGG ACCCAGTGGC 
GGAAGAGTGG ACATTGCTGG 
TGATGCTGGA AACTTTCAGG 
GACCAGAACA TTGAATATGA 
TCTCATAGAA GAGAAAGTCA 
AAACTTTTAC CCAGGTTCCA 
GCTTCTCCTG AAGTAAAATC 
CATAGGTAAC TCATCTTTTA 
AGGCATATGA GTATCAGGAA 
CCTAAAAATA AGAAAGCCTT 
AAGGGATCAC ACTGGAGAGA 
CCTTTATTTT CCATTCAAGC 
GATGGAACTT ATAAATGTAA 
TTTATATCTT ATCCATGAAA 
GTAAACAATG TGGTAAATCC 
GAAAGAACTC ACACTGGGGA 
AGCATTTCAT AGTTCTAGTT 
GAGAGAAGCC TTATCAATGC 
AGTTCTCTTC GTAGACATGA 
ATGTAAGCAA TATGGGGAAG 
ACATAAGAAT GAACTCTGGA 
AAAGGCTTTT ATTCTGCCAA 
TGGAGAGAAA CGCTATAAAT 
CCAGTTCCTT TCGATATCAT 
GAGTGTAAGC AGTGTGGGAA 
GCACGGTGGG ACTCACACTG 
GGAAAGCCTT CAGATCTACC 
ACTGGAGAGA AACCCTATGA 
TGTGAAGCAC CTTCAAATTC 
CCTCTGGAGA A AGACCTTAT 
TCTGCCAAGT CATTTCAAAC 
CTATGAATGC AACCAATGTG 
GATATCATGA AAGGACTCAC 
TGTGGGAAAG CCTTCAGATC 
TCACACTGGA GAGAAACCCT 
GTTGTGCCTC AAACCTTCGA 
CCCTATGAGT GTAAGCAATG 
TCAGATGCAT GAAAGGACTC 
AATGCGAAAA AGCATTCTGT 
AAGCACAGAG GAGAGAAGCC 
CACATCTGCC A AGATTCTTC 
AACACTATGA ATGTAAGGAA 
TTGCATATAC ACGCAAGGAC 
GGATTGTGGG A AAGCATTCA 
AGACTCACAC TGGAAGGAAG 
TTCACATTTT CCAGTTCTTT 
GAAACCCTAT CAATGTAAGC 
CTTTTCAATG TCATGAAAGG 
ATTCTAGTTC CGTTTGATAT 
TATGAATGTA AGCA ATGTGG 
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2b51 GAAAGCCTTC AGATGTGCCT CGCACCTTCA ACGGC ATGGA AGGGTTCACA 

2701 CTTGGGAGAA ACTCTATGAA TGTAAGCAGT ATGGGAAAGC CTTCAGATCT 

5751 GCCAAGATTC TTTGAATACA GATAATTAAT GTAAACAATT ATCATAAGTA 

SaOl TACTAACATG TTATTCTTTT TAAATAAGAA GGTATAATAA AATATCCCAT 

5 2fl51 TGGTTTTATG TATTAAAAAA AAAAAAAAAA AAAA 



BLAST Results 



10 

No BLAST result 



Medline entries 

15 

Katoh O-i Oguri T-i Takahashi T-. Takai Si Fujiwara Y-i Uatanabe 
ZK1-. a 

20 novel Kruppel-type zinc finger gene-, is induced following 

exposure to ionizing radiation and enhances apoptotic cell death 
on hematopoietic cells* Biochem Biophys Res Commun l^fl Aug 
2a^24 c K3):5 c 15-bDD 

25 ^51373^3: 

Wick MJ-. Ann DK-i Lee Nil-. Loh HH.n Isolation of a cDNA encoding a 
novel 

zinc-finger protein from 

neuroblastoma x glioma NGlOfl-15 cells* Gene ms Jan 
30 23^152(2) :227-32 



35 Peptide information for frame 1 



ORF from 127 bp to 2352 bp^ peptide length: 7M2 

Category: similarity to known protein 
40 Classification- Nucleic acid management 

Prosite motifs: RGD (14b-lMfl) 

ATP_GTP_A <n5-202) 

ZINC_FINGER_C2H2 <Mfa-21fa> 

ZINC_FINGER_C2H2 (22H-2MM) 
45 ZINC_FINGER_C2H2 (252-272) 

ZINC_FINGER_C2H2 (2flD-3DD) 

ZINC_FINGER_C2H2 (3Dfl-32fl) 

ZINC_FINGER_C2H2 (3fc»M-3aM) 

ZINC_FINGER_C2H2 (3^2-m2) 
50 ZINC_FINGER_C2H2 (42D-4M0) 

ZINC_FINGER_C2H2 (MMfl-Mbfi) 

ZINC_FINGER_C2H2 (510-53D) 

ZINC_FINGER_C2H2 (S3fl-55a> 

ZINC_FINGER_C2H2 (5fe.fe.-5Ab) 
55 ZINC_FINGER_C2H2 (5T4-bm) 

ZINC_FIN6ER_C2H2 (L,22-b42) 

ZINC_FINGER_C2H2 (faS0-fc>70) 

ZINC_FINGER_C2H2 (b7fi-b c ifl) 
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ZINC_FINGER_C2H2 (70b-?2b> 
ZINC_FINGER_CEH2 ( MTt-ITB ) 



5 1 MPCCSHRSCR EDPGTSESRE MDPVAFEDVA VNFTflEEUTL LDISflKNLFR 

51 EVMLETFRNL TSIGKKLISDfi NIEYEYtJNPR RSFRSLIEEK VNEIJCEDSHC 

101 GETFTflVPDD RLNFflEKKAS PEVKSCDSFV CAEVGIGNSS FNMSIRGDTG 

151 HKAYEYfiEYG PKPYKC<3<2PK NKKAFRYRPS IRTflERDHTG EKPYACKVCG 

EDI KTFIFHSSIR RHHVI1HSGDG TYKCKFCGKA FHSFSLYLIH ERTHTGEKPY 

10 ESI ECKfiCGKSFT YSATLfllHER THTGEKPYEC SKCDKAFHSS SSYHRHERSH 

301 HGEKPYflCICE CGKAFAYTSS LRRHERTHSG KKPYECKfJYG EGLSYLISFfl 

351 THIRMNSGER PYKCKICGKG FYSAKSFflTH EKTHTGEKRY KCKC2CGKAFN 

M01 LSSSFRYHER IHTGEKPYEC KOCGKAFRSA SflLRVHGGTH TGEKPYECKE 

1451 CGKAFRSTSH LRVHGRTHTG EKPYECKECG KAFRYVKHLfl IHERTEKHIR 

15 SD1 MPSGERPYKC SICEKGFYSA KSFflTHEKTH TGEKPYECNfl CGKAFRCCNS 

551 LRYHERTHTG EKPYECKflCG KAFRSASHLR MHERTHTGEK PYECKflCGKA 

bDl FSCASNLRKH GRTHTGEKPY ECKflCGKAFR SASNLtJIIHER THTGEKPYEC 

b51 KECEKAFCKF SSF<2IHERKH RGEKPYECKH CGNGFTSAKI LCIHARTHIG 

701 EKHYECKECG KAFNYFSSLH IHARTHMGEK PYECKDCGKA FS 

20 



BLASTP hits 

25 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lDilb-. frame 1 
No Alert BLASTP hits found 

30 

Peptide information for frame E 



35 ORF from 17D3 bp to E53M bp; peptide length: 2^ 
Category: questionable ORF 
Classification: no clue 

1 MKKLTLERNP MNATN VVKPS DVAIPFDIMK GLTLERNPMS VSNVGKPSDL 

40 51 PHTFECMKGL TLERNPMSVS NVGKPSVVPtJ TFESMVGLTL ERNPMSVSNV 

1D1 GKPSDLP<3TF RCMKGLTLER NPMNVRNAKK HSVNSLLFKY I1KGSTEERSP 

151 DNVSIVGMDS HLPRFFKYMfl EHTLERNTflN VRNAEKHSII FLPCIYTflGL 

E01 IUERSHMNVR IVGKHSASLV PFMDMNRLTL EGSTMNASNV AKLSHFPVLF 

E51 DIMKGLTLGR NPINVSSV6K PSFLLLLFNV MK6LTRERNP MSVF 

45 



BLASTP hits 

50 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_10ilb-. frame E 

TREMBL:AF153E01_1 product: "zinc finger protein dp"i Homo 
55 sapiens zinc 

finger protein dp mRNA i complete cds--. N = 1-. Score = EES, P = 
M-le-15 
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>TREt1BL : AF1532D1_1 product: "zinc finger protein dp"; Homo 
sapiens zinc 

finger protein dp mRNAi complete cds- 
5 Length = 123 

HSPs: 

Score = 225 (33-fi bits)i Expect = H-le-lfl-i P = M-le-lfl 
10 Identities = flM/2Mb (312) Positives = lEB/2Mb 

Query: lb VVKPSDVA- 

IPFDInTGLTLERNPnSVSNVGKPSDLPHTFECflKGLTLERNPIISVSNVGK ?H 

V KPS AIFI++LRN+V+VKS T ++G 
15 TLERNP++V +VGK 

Sbjct: 3 VGKPSVRAQILFCIRESI- 

LGRNHIHVISVAKVSVRIQTLLNIEGSTLERNPINVMSVGK bl 

Query: 75 

20 PSVVPQTFESMVGLTLERNPMSVSNVGKPSDLPQTFRCnKGLTLERNPtlNVRNAKKHSVN 13M 

+ Q+ + G LERNP+ V NV KPS Q + TLER+ +V 

+ A K V 
Sbjct: b2 

LLIRAQSLFYIRGFILERNPIPVINVAKPSVGFQILLIINEFTLERSLTHVISAIKCLVE 121 

25 

Query: 135 SLLFKYUKGSTEERSPFINVSIVGHDS- 
HLPRFFKYMfiEHTLERNTMNVRNAEKHSIIFLP 1=13 

+ + + R+PMNV VG P F +++E TLERN fl+V 

K + 

30 Sbjct: 122 DEILLNITEFICVRNPflNVMNVGKPLVRAPTLF- 
FIRESTLERNLUHVVIVLKALVAVQI ISO 

Query: 1TM 

CIYTflGLIUERSHMNVRIVGKHSASLVPFIIDriNRLTLEGSTflNASNVAKLSHFPVLFDiri 253 
35 + + ER+HJ1+V V K +++ TL S + A V K S 

+ + 

Sbjct: Ifll 

LLSIKEYTLERNHnHVISVIKVLVKAQTSLNIREYTLVKSLIIAIVVRKPSVRVLTLFFI 2M0 

40 Query: 25M KGLTLGRN 2bl 

+ TL +N 
Sbjct: 241 REFTLEKN 2»4fl 

Score = 215 (32-3 bits)-. Expect = 1-le-lb-. P = 1-le-lb 
45 Identities = S2/2Mb (335c), Positives = 124/2Mb iSQ'4 ) 

Query: MM 

VGKPSDLPHTFECMKGLTLERNPnSVSNVGKPSVVPQTFESMvGLTLERNPMSVSNVGKP 1D3 
VGKPS C++ L RN + V +V K SV QT ++' G 

50 TLERNP++V +VGK 
Sbjct: 3 

VGKPSVRAQILFCIRESILGRNHIHVISVAKVSVRIQTLLNIEGSTLERNPINVtlSVGKL b2 

Query: 1DM SDLPQTFRCMKGLTLERNPMNVRNAKKHSVNSLLFKYilKGSTEERSPMNV- 
55 SIVGM » 15=) 

Q+ ++G LERNP+ V N K SV + + T ERS +V S 

+ » 



-268- 



10 



WO 01/98454 PCT/IB01/02050 
Sbjct: b3 

LIRA(3SLFYIRGFILERNPIPVINVAKPSVGF(3ILLIINEFTLERSLTHVISAIKCLVEI> 122 

Query: IbD SHLPRFFKYMfSEHTLERNTMNVRNAEKHSIIFLPCIY- 
TdGLIWERSHflNVRIVGKHSAS 21fi 

L + ++(3 RN HNV N K ++ P ++ + ER + M + V 

IV K + 

Sbjct: 123 EILLNITEFIflV RNPMNVMNVGK- 

PLVRAPTLFFIRESTLERNLMH VVIVLKALVA 177 



<2uery: 21T LVPFNDHNRLTLEGSTMNASNVAK- 
LSHFPVLFDinKGLTLGRNPINVSSVGKPSFLLLL 277 

+ + + TLE + M+ + V K L +1 + TL ++ I V 

KPS +L 

15 Sbjct: 176 VdlLLSIKE YTLERNHI1HVISVIKVLVKAt3TSLNIRE- 
YTLVKSLIIAIVVRKPSVRVLT 23b 

tfuery: 27fl FNVMKGLTRERN 2&^ 
++ T E+N 
20 Sbjct: 237 LFFIREFTLEKN 2*** 

Score = 2D7 (31-1 bits)-. Expect = 5-2e-15-. P = S.2e-15 
Identities = 50/270 (2^*)-, Positives = 12^/270 (M7*) 

25 fluery: 1 IIKKLTLERNPHNATN VVKPSDVAIPFPI- 

MKGLTLERNPriSVSNVGKPSDLPHTFECnKG ST 

+++ L RN ++ +V K S V I + ++G TLERNP++V +VGK 

+ + + G 

Sbjct: lb IRESILGRNHIHVISVAKVS- 
30 VRIt2TLLNIEGSTLERNPINVnSVGKLLIRA(3SLFYIRG 7H 

tfuery: bO 

LTLERNPI1SVSNVGKPS VVP(3TFESnVGLTLERNPHSVSNVGKPSPLP(2TFRCnKGLTLE 11^ 
LERNP+ V NV KPSV Q + TLER+ V + K + 

35 + 

Sbjct: 75 

FILERNPIPVINVAICPSVGFc3ILLIINEFTLERSLTHVISAIKCLVEDEILLNITEFI(3V 13*4 
<2uery: 120 

40 RNPHNVRNA<KHSVNSLLFKYnKGSTEERSPHNVSIVGMI>SHLPRFFKYrir3EHTLERNTri 17T 

RNPMNV N K V + +++ ST ER+ M+V IV + 

++E+TLERN M 
Sbjct: 135 

RNPnNVnNVGKPLVRAPTLFFIRESTLERNLriHVVIVLKALVAVfllLLSIKEYTLERNHM 1TM 

45 

<2uery: IflD 

NVRNAEKHSIIFLPCIYTflGLIblERSHnNVRIVGKHSASLVPFMDriNRLTLEGSTHNASN 23T 
+V + K + + + +S + +V K S ++ + TLE 

+ + 
50 Sbjct: ITS 

HVISVIKVLVKAiSTSLNIREYTLVKSLIIAIVVRKPSVRVLTLFFIREFTLEKNYYLCTfl 25M 

fiuery: 2M0 VAKLSHFPVLFDIHKGLTL—GRNPINVSSVGK 270 
+K F + D++K + G P S K 
55 Sbjct: 255 CSK-- SFStflSDLIKHdRIHTGEKPYKCSECRK 2fiS 

Score = Ifll (27-2 bits)-. Expect = me-ll-i P = 1-Me-ll 
Identities = 7M/2bT (275:) n Positives = llb/2t»T (t*3Z) 
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<2uery: 5 

TLERNPnNATNVVKPSDVAIPFDII1KGLTLERNPf1SVSNV6KPSI>LPHTFECMK6LTLER bM 
TLERNP+N +V K A ++G LERNP+ V NV KPS 

5 + TLER 

Sbjct: MS 

TLERNPINVMSVGKLLIRA(3SLF YIRGFILERNPIPVINVAKPSV6Ft3ILLIINEFTLER 1D7 
Query: fe,5 

10 NPnSVSNVGKPSVVP(2TFES[1VGLTLERNPMSVSNVGKPSDLP(2TFRCnKGLTLERNPnN 1EM 

+ V + K V + ++ RNPI1 + V NVGKP T + + 

TLERN M + 
Sbjct: IDA 

SLTHVISAIKCLVEDEILLNITEFIdSVRNPnNVnNVGKPLVRAPTLFFIRESTLERNLMH lb7 

15 

fluery: 1E5 VRNAKKHSVNSLLFKYMKGSTEERSPflNV- 
SIVGHDSHLPRFFKYntfEHTLERNTNNVRN lfi3 

V K V + +K T ER+ 11 + V S++ + ++E+TL 

+ + + 

20 Sbjct: Iba VVIVLKALVAVdILLSIKEYTLERNHMHVISVIKVLVKArJTSLN- 
IREYTLVKSLIIAIV EEb 

fluery: IflM 

AEKHSIIFLPCIYTc36LIUERSHnNVRIV6KHSASLVPFMDnNRLTLEGSTriNASNVAKL EM3 
25 K S+ L + + E+++ K + + + R+ 

S K 

Sbjct: EE7 

VRKPSVRVLTLFFIREFTLEKNYYLCT(2CSKSFS(2ISI>LIKHr3RIHTGEKPYKCSECRKA Efib 

30 fluery: EMM SHFPVLFDIMKGLTLGRNPINVSSVGKPSF E73 

L + + + G+ P GK SF 

Sbjct: EA7 FSflCSLLALH<3RIHTGKKPNPCI>ECGK-SF 315 

Score = ibb (EM • bits)-, Expect = a-Me-10-i P = a-Me-10 
35 Identities = b3/nM (3EV.), Positives = A^/nM <M5*) 

Guery: 100 

VGKPSDLP(3TFRCriKGLTLERNPHNVRNAKKHSVNSLLFKYMKGSTEERSPf1NVSIVGni> 15=1 
VGKPS Q C++ L RN ++V + K SV + + GST 

40 ER+P+NV VG 
Sbjct: 3 

VGKPSVRAflILFCIRESILGRNHIHVISVAKVSVRI(3TLLNIEGSTLERNPINVMSVGKL bE 
Query' IbO 

45 SHLPRFFKYM(2EHTLERNTnNVRNAEKHSIIFLPCIYT(3GLILIERSHnNVRIVGKHSASL El^ 

+ Y++ LERN + V N K S+ F + ERS +V 

K 

Sbjct: b3 

LIRA(3SLFYIRGFILERNPIPVINVAKPSVGF(3ILLIINEFTLERSLTHVISAIKCLVED 1EE 

50 

<2uery: SED VPFP1DMNRLTLEGSTMNASNVAK- 
LSHFPVLFDII1KGLTLGRNPINVSSVGKPSFLLLLF E7a 

+++ + MN NV K L P LF I + TL RN ++V V K 

+ + 

55 Sbjct: 1E3 EILLNITEFIdVRNPMNVMNVGKPLVRAPTLFFIRES- 
TLERNLHHVVIVLKALVAV6IL iai 

Query: E7^ NVMKGLTRERNPMSV E^3 
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+ K T ERN M V 
Sbjct: IflH LSIKEYTLERNHMHV 1U 
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Pedant information for DKFZphtes3_10ilb -, frame 1 
Report for J>KFZphtes3_10ilb . 1 



ELENGTHJ 7flM 

nirnjj ^Das7-os 



EpIJ T-EM 

EHOMOLJ TREMBL:ABOimm_l gene: "ZK1"t product: "Kruppel- 

15 type zinc finger protein"^ Homo sapiens ZK1 mRNA for Kruppel-type 
zinc finger protein-, complete cds- CD 

EFUNCATJ 3D-1D nuclear organization ES- cerevisiae-. YJLOSbcJ 
be-33 

EFUNCATJ DM-OS-Ol^ transcriptional control ES- cerevisiae-. 

20 YJLOSbcJ be-33 

EFUNCATJ DM • n other transcription activities ICS • cerevisiae-. 
Y0R113wJ Se-SM 

EFUNCATJ OM. 01-01 rrna synthesis ES. cerevisiae-. YPRlflbc PZF1 - 
TFIIIAJ le-EO 

25 EFUNCATJ OM.03.01 trna synthesis ES- cerevisiae-. YPRlflbc PZF1 - 
TFIIIAJ le-EO 

EFUNCATJ 13 • DM homeostasis of other ions ES. cerevisiae-i 
YNL0£?wJ le-13 

EFUNCATJ 11. D7 detoxi f icaton ES. cerevisiae-. YGLESMwJ Ee-IE 

30 EFUNCATJ Dl • DE - OH regulation of nitrogen and sulphur utilization 
ES. cerevisiae-. YGLE5MwJ Ee-IE 
EFUNCATJ 01-05. DM regulation of carbohydrate utilization ES- 
cerevisiae-. YGLEOTwJ Ee-11 

EFUNCATJ OiJ.OS.TT other mrna-transcr ipt ion activities ES- 
35 cerevisiae-. YEROEflcJ 3e-10 

EFUNCATJ 11.01 stress response ES - cerevisiae-. YKLObEwJ le-OT 
EFUNCATJ 01. 01. DM regulation of amino-acid metabolism ES. 
cerevisiae-, YDRE53cJ Se-O^ 

EFUNCATJ unclassified proteins ES • cerevisiae-. YBRObbcJ 

40 3e-0fl 

EFUNCATJ D3.07 pheromone response-, mating-type determination-, 
sex-specific proteins ES - cerevisiae-. YDRlHbcJ le-07 
EFUNCATJ 03 - ES cytokinesis ES - cerevisiae-. YLR131cJ £e-0b 
EBLOCKSJ BLOOMbb TFIIS zinc ribbon domain proteins 
45 EBLOCKSJ BLOD245A Phytochrome chromophore attachment site 
proteins 

EBLOCKSJ DMDIISIB 

EBLOCKSJ PFD13b3B 

EBLOCKSJ BLD1D30 
50 EBLOCKSJ PFOOO^bB 

EBLOCKSJ BLDDOEfl Zinc finger-. CEHE type-, domain proteins 

EBLOCKSJ BP014213E 

EBLOCKSJ BP0MB13C 

EBLOCKSJ BP0M213B 
55 ESCOPJ dEadr 7-31.1-1-M ADR1 Esynthetic based on yeast 

(Saccharomyce Ee-05 

EPIRKUJ nucleus le-53 

EPIRKUJ RNA binding Ee-Sfl 
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EPIRKIO duplication le-3M 

EPIRKUJ tandem repeat le-171 

EPIRKIO spermatogenesis 5e-bE 

EPIRKIO zinc le-lbT 

5 EPIRKIO zinc finger 0 . 0 

EPIRKIO DNA binding D-D 

EPIRKIO metal binding le-12D 

EPIRKIO phosphoprotein Ee-5fi 

EPIRKIO leucine zipper le-53 

10 EPIRKIO alternative splicing Ee-Sfl 

EPIRKIO eye lens le-lll 

EPIRKIO oocyte. le-10b 

EPIRKIO transcription factor le-111 

EPIRKIO embryo le-10b 

15 EPIRKIO segmentation le-3M 

EPIRKIO transcription regulation le-15E 

ESUPFAfO POZ domain homology 7e-fl3 

ESUPFAM3 transcription factor Krueppel le-3M 

ESUPFAM3 zinc finger protein ZFP-3b le-173 

20 ESUPFAM J transcription factor IIIA fie-31 

EPR0SITE3 ATP_GTP_A 1 

EPROSITEJ RGJ> 1 

EPROSITEJ ZINC_FINGER_C2HE Ifl 
EPFAPI3 Zinc fingeri CEHE type 

25 EPFAM3 TNFR/NGFR cysteine-rich region 

EKIO Irregular 

EKIO 3D 

EKIO L0U_C0MPLEXITY 3-S7 Z 



30 



35 



40 



45 



50 



55 



SEG RKURGSLSSPSSLRGRRLVTGGTSPRGTUCLYPGFCRSVACAMPCCSHRSCREDPGTSES 

SEG • • - xxxxxxxxxxxxxxx 

ImeyF 

SEG RENDPVAFEDVA VNFT(3EEliJTLLDISflKNLFREV!1LETFRNLTSIGKKUS]>t2NIEYEYfiN 

SEG 

ImeyF 

SEG PRRSFRSLIEEKVNEIKEDSHCGETFTflVPDDRLNFfiEKKASPEVKSCDSFVCAEVGIGN 

SEG 

ImeyF 

SEG SSFNNSIRGDTGHKAYEYGEYGPKPYKCGGPKNKKAFRYRPSIRTGERDHTGEKPYACKV 

SEG - 

ImeyF 

SEG CGKTFIFHSSIRRHI1VI1HSGDGTYKCKFCGKAFHSFSLYLIHERTHTGEKPYECKGCGKS 

SEG 

ImeyF 

SEG FTYSATLGIHERTHTGEKPYECSKCDKAFHSSSSYHRHERSHriGEKPYGCKECGKAFAYT 
SEG • xxxxxxxxxxxxx 
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SE(3 SSLRRHERTHSGKKPYECKt3YCEGLSYLISF(3THIRHNSGERPYKCKICGKGFYSAKSF(3 

5 SEG 

ImeyF 



SEtf THEKTHTGEKRYKCK(2CGKAFNLSSSFRYHERIHTGEKPYECK(2CGKAFRSAS(3LRVHGG 

10 SEG 

ImeyF 



SE<2 THTGEKPYECKECGKAFRSTSHLRVHGRTHTGEKPYECKECGKAFRYVKHL(3IHERTEKH 

15 SEG 

ImeyF 



SE(2 IRHPSGERPYICCSICEICGFYSAKSF(3THEKTHTGEICPYECN(3CGKAFRCCNSLRYHERTH 

20 SEG 

ImeyF 



SEfl TGEKPYECKdCGKAFRSASHLRNHERTHTGEKPYECKtSKGKAFSCASNLRKHGRTHTGEK 

25 SEG • 

ImeyF 

. -TTTEETTTTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCC 

SE<3 PYECK(2CGKAFRSASNLt3nHERTHTGEKPYECKECEKAFCKFSSF(2IHERKHRGEKPYEC 

30 SEG 

ImeyF 

CEEETTTTEEECCHHHHHHHHHHH 

SEd KHCGNGFTSAKIL(2IHARTHIGEKHYECKECGKAFNYFSSLHIHARTHnGEKPYECKDCG 

35 SEG 

ImeyF 



SE(3 KAFS 

40 SEG 

ImeyF 



45 



Prosite for DKFZphtes3_JiDilb • 1 



50 



55 



PSDOOlb 


iaa- 


>ni 


RGD 






PDOCOOOlb 


PSDD017 


E37- 


>EMS 


ATP_i 


GTP_A 




PD0C00017 


PSOOOEB 


23a- 


^EST 


ZINC 


FINGER. 


_C2HE 


PDOCOOOEB 


PSODDSa 


Ebb- 


>EB7 


ZINC 


FINGER. 


_CEHE 


PDOCODOEB 


PSOOOEa 


sm- 


>315 


ZINC. 


FINGER 


_C£HE 


ppocooosa 


PSDDD25 


3EE- 


>3M3 


ZINC. 


FINGER. 


.CEHE 


PDOCOOQEB 


PSD0026 


350- 


>371 


ZINC. 


FINGER. 


.CEHE 


PDOCOOQEB 


PSOOOEB 


MOb- 


>ME7 


ZINC. 


FINGER 


.CEHE 


PDOCODOEB 


PSOOOEB 


M3M- 


>MSS 


ZINC. 


FINGER 


.CEHE 


PDOCODOEa 


PSQODEB 


MbB- 


>MB3 


ZINC. 


FINGER 


.CEHE 


PDOCODOEa 


PSOOOEB 


i^O- 


>511 


ZINC. 


FINGER 


.CEHE 


PDOCODOEa 


PS00DE8 


S5E- 


>S73 


ZINC. 


FINGER. 


.CEHE 


PDOCODOEa 



-273- 



WO 01/98454 
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PSODDSa 


530- 


>b01 


ZINC_ 


.FINGER. 


.CEHE 


■ A/ V V. U U U 1 LJ 


PSDOQEfl 

• ^rf \J U IbJ 


bOB- 


>bE1 


zinc" 


"finger" 


CEHE 


PDOCDDO? A 

• 17 v V U U LJ l_ LJ 


PSDODSfl 


b3b- 


>b57 


zinc" 


IfingerI 


ICEHE 


PDOCOOOEfi 


PSD0D2A 


bbM- 


>bB5 


ZINC 


FINGER 


.CEHE 


PDOCODOEa 


PS0D02A 


b'JE- 


>713 


ZINC. 


.FINGER. 


C2HE 


PDOCODDEa 


PSQOOSfl 


7E0- 


>7M1 


ZINC 


FINGER 


CEHE 


PD0C000SB 


PSDDOEa 


7M8- 


>7b1 


ZINC 


FINGER. 


CEHE 


PDOCDDDEB 


PSDDOEa 


sia- 


>sm 


ZINC 


FINGER. 


CEHE 


PDOCOOOEa 
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Pfam for DKFZphtes3_iailb . 1 



15 HIiri_NAME TNFR/NGFR cysteine-rich region 



20 



Hiin 

duery 
b7 



*CpeGtYtD-b)NH\/pqClpC. -trCePEriGdYrivqPCTwTGNTVC* 
C + +++ +++++C C ++C+++ G++++++ + + V 
3D CLYPGFCRSVACAMPC — CSHRSCREDPGTSESRENDP VA 



25 



30 



35 



40 



45 



50 



55 



Hni1_NAME Zinc finger-. C2H2 type 

HMI1 *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ CGKTF S+ RRHH +H 
duery 236 CKV — CGKTFIFHSSIRRHMVNH 25fi 

32 - IS (bits) f: Ebb t: 2flb Target: dkf zphtes3_10ilb - 1 
similarity to ZK1 (Homo sapiens)-! complete cds. 

Alignment to HMF1 consensus: 
fluery *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ CGK+F + S + +H RTH 

dkfzphtes3 2bb CKF — CGKAFHSFSLYLIHERTH 2flb 

Query f: 2<?M t: 3m Target: dkf zphtes3_10ilb • 1 

si mi 1 ar ity to ZK1 (Homo sapiens ) ■> complete cds . 

Alignment to HMM consensus: 
HfW ^ *CpwPDCgKtFrrwsNLrRH(1RTH* 

C+ CGK+F+++ +L++H RTH 
duery 2^4 CKQ — CGKSFTYSATLfilHERTH 3m 

34.22 (bits) f: 322 t: 3M2 Target: dkf 2phtes3_10ilb . 1 
similarity to ZK1 (Homo sapiens)-! complete cds- 
Alignment to HMH consensus: 

*CpwPDCgKtFrrwsNLrRHI1RTH* 
C++ C+K+F ++S++ RH R+H 
CSK — CDKAFHSSSSYHRHERSH 342 



fluery 

dkf zphtes3 



322 



duery f: 3S0 t: 370 Target: dkf zphtes3_lDi lb . 1 

similarity to ZK1 (Homo sapiens)-! complete cds- 

Alignment to HNP1 consensus : 
HMH *CpwPDCgKtFrrwsNLrRHNRTH* 

C++ CGK+F + S+LRRH RTH 
(Suery 350 CKE — CGKAFAYTSSLRRHERTH 370 



-274- 
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32-0=1 (bits) f: 40b t: 42b Target: dkf zph t es3_lDilb . 1 
similarity to ZK1 (Homo sapiens)-, complete cds- 

Alignment to WW consensus: 
fluery *CpwPDCgKtFrrwsNLrRHMRTH* 
5 C++ CGK F ++ ++++H +TH 

dkfzphtes3 40b CKI--CGKGFYSAKSFi2THEK:TH 42b 

Query f: 434 t: 454 Target: dkf zphtes3_10ilb.l 

similarity to ZK1 (Homo sapiens)-! complete cds- 
10 Alignment to HUM consensus: 

HW1 *CpwPDCgKtFrrwsNLrRHI1RTH* 

C+ CGK+F+ +S++R H R+H 
tfuery 434 QKQ — CGKAFNLSSSFRYHERIH 4 54 

15 32-T4 (bits) f: 4b2 t: 4fl2 Target: dkf zphtes3_10ilb - 1 
similarity to ZK1 (Homo sapiens)i complete cds- 

Alignment to HUH consensus: 
Query *CpwPDCgKtFrrwsNLrRHriRTH* 

C+ CGK+FR++S+LR H TH 
20 dkfzphtes3 4b2 CKQ-- CGKAFRSASflLRVHGGTH 4fi2 

Query f: 4^0 t: 510 Target: dkf zphtes3_10ilb . 1 

similarity to ZK1 (Homo sapiens)-i complete cds- 
Alignment to HUH consensus: 
25 HMM *CpwPDCgKtFrrwsNLrRHI1RTH* 

C++ CGK+FR+ S + LR H RTH 
Query 4^0 CKE--CGKAFRSTSHLRVHGRTH SID 

30- bT (bits) f: 51fl t: 540 Target: dkf zphtes3_10ilb . 1 
30 similarity to ZK1 (Homo sapiens) t complete cds- 

Alignment to HUH consensus: 
fluery *CpwPDCgKtFrrwsNLrRHHR • . T - H* 

C++ CGK+FR+ +L++H R H 
dkfzphtes3 Slfl CKE — CGKAFRYVKHLf2IHERTE-KH 540 

35 

Query f: 552 t: 572 Target: dkf zphtes3_10ilb - 1 

similarity to ZK1 (Homo sapiens)-, complete cds- 

Alignment to HUM consensus: 
HUM ~ *CpwPDCgKtFrrwsNLrRHf1RTH* 
40 C++ C+K F ++ ++++H +TH 

fiuery 552 CSI--CEKGFYSAKSF(2THEKTH 572 

31- 33 (bits) f: 5fl0 t: bOO Target: dkf zphtes3_10ilb . 1 
similarity to ZK1 (Homo sapiens)-! complete cds- 

45 Alignment to HW1 consensus: 

<2uery *CpwPDCgKtFrrwsNLrRHf1RTH* 

C+ CGK+FR +LR H RTH 
dkfzphtes3 SflO CN(2 — CGKAFRCCNSLRYHERTH bOD 

50 Query f: bOfi t: b2fi Target: dkf zphtes3_10ilb . 1 

similarity to ZK1 (Homo sapiens) n complete cds- 

Alignment to HMM consensus : 
Hnii *CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CGK+FR++S+LR+H RTH 
55 Query bOfi CKd — CGKAFRSASHLRMHERTH b2fl 

35-30 (bits) f: b3b t: b5b Target: dkf zphtes3_10ilb . 1 
similarity to ZK1 (Homo sapiens)-, complete cds- 
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Alignment to HUH consensus: 
fluery *CpwPDCgKtFrrwsNLrRHMRTH* 

C + CGK+F+ +SNLR+H RTH 
dkf zphtes3 b3k CKQ — CGKAFSCASNLRKHGRTH bSb 

5 

<3uery f: bfc.M t: bflM Target: dkf zphtes3_10ilb . 1 

similarity to ZK1 (Homo sapiens)-, complete cds- 

Alignment to HMM consensus: 
HUM *CpwPDCgKtFrrwsNLrRHMRTH* 
10 C+ CGK+FR++SNL++H RTH 

(3uery bbM CK<3 — CGKAFRSASNLCMHERTH bfiM 

31. 7^ (bits) f: L^2 t: 712 Target: dkf zphtes3_10ilb . 1 
similarity to ZK1 (Homo sapiens)i complete cds- 
15 Alignment to HW1 consensus: 

duery *CpwPDCgKtFrrwsNLrRHHRTH* 

C++ C+K+F+ S++++H R H 
dkfzphtes3 b^5 CKE — CEKAFCKFSSFdIHERKH 712 

20 duery f: 720 t: 7M0 Target: dkf zphtes3_10ilb - 1 

similarity to ZK1 (Homo sapiens)-! complete cds- 

Alignment to HUM consensus: 
Hfin ~ *CpwPDCgKtFrrwsNLrRHNRTH* 

C++ CG F+++ L++H RTH 
25 tfuery 720 CKH — CGNGFTSAKILdlHARTH 7M0 

3M-Bfi (bits) f: 7MB t: 7fc,fi Target: dkf zphtes3_lDilfc» • 1 
similarity to ZK1 (Homo sapiens) ^ complete cds- 
Alignment to Hfin consensus: 
30 Query *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ CGK+F++ S + L +H RTH 
dkf zphtes3 7Mfi CKE--CGK AFNYFSSLHIHARTH 7t.fi 



35 



40 



50 



Pedant . inf ormation for DKFZphtes3_10iltn frame 2 
Report for DKFZphtes3_10ilb. 2 



CLENGTHJ 2TM 

EMIO 33Dfl3.^fi 

CpI3 T.^7 

45 EH0I10LJ TREMBL : AF153201_1 product: "zinc finger protein 

dp"n Homo sapiens zinc finger protein dp mRNAi complete cds. 7e- 
17 

EKH3 All_Alpha 



SE<3 nKKLTLERNPHNATNVVKPSPVAIPFDIMKGLTLERNPriSVSNVGKPSPLPHTFECnKGL 
PRD cccccccccccceeeeecccccchhhhhccccccccccccccccccccccccchhhhhee 



SE<2 TLERNPHSVSNVGKPSVVP(3TFESriVGLTLERNPriSVSNVGKPSI>LP(2TFRCf1KGLTLER 

55 PRD ecccccccccccccccchhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhcc 

SE<2 NPflNVRNAKKHSVNSLLFKYMKG STEER SPUN VS I VGnDSHLPRFFKYflfiEHTLERNTMN 

PRD cccccccccccccccccccccccccccccccceeeeecccchhhhhhhhhhhhhhhcccc 
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. SEt? VRNAEKHSIIFLPCIYTflGLIUERSHnNVRIVGKHSASLVPFflDIINRLTLEGSTnNASNV 

PRD chhhhhhheeeccceeechhhhhcccce9Geeccccceeeeccchhhhhhhccccccccc 

5 SE<2 AKLSHFPVLFDiriKGLTLGRNPINVSSVGKPSFLLLLFNVriKGLTRERNPriSVF 

PRD cccccccchhhhhhhhcccccccccccccccchhhhhhhhhccccccccccccc 



10 



(No Prosite data available for DKFZphtes3_10ilb . E ) 
(No Pfam data available for DKFZphtes3_10ilL> - E ) 
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DKFZphtes3_10nlD 



5 group : testis derived 

DKFZphtes3_lDnl0 encodes a novel 5D2 amino acid protein without 
similarity to known proteins. 

10 The mRNA is differentially polyadenylated and the novel protein 
is ubiquitously expressed- 

No informative BLAST results^ No predictive prosite-. pfam or SCOP 
motif e • 

15 The new protein can find application in studying the expression 
profile of testis-specif ic genes- 



20 



unknown protein 
differentially polyadenylated 
Sequenced by diagen 
25 Locus: unknown 

Insert length: 2551 bp 

Poly A stretch at pos- 2531n polyadenylation signal at pos- 2513 

30 

1 CTCAGCCTCC CAAGTGGCTG GGACTGCAGG TTCTAAATGG CTTCTAAGAA 

51 GTTGGGTGCA GATTTTCATG GGACTTTCAG TTACCTTGAT GATGTCCC AT 

101 TTAAG ACAGG AGACAA ATTC AAAACACCAG CTAAAGTTGG TCTACCTATT 

151 GGCTTCTCCT TGCCTGATTG TTTGCAGGTT GTCAGAGAAG TACAGTATGA 

35 201 CTTCTCTTTG GAAAAGAAAA CCATTGAGTG GGCTGAAGAG ATTAAGAAAA 

251 TCGAAGAAGC CGAGCGGGAA GCAGAGTGCA AAATTGCGGA AGCAGA AGCT 

301 AAAGTGAATT CTAAGAGTGG CCCAGAGGGC GATAGCAAAA TGAGCTTCTC 

351 C A AGACTCAC AGTACAGCCA CAATGCCACC TCCTATT AAC CCCATCCTCG 

MD1 CCAGCTTGCA GCACAACAGC ATCCTCACAC CAACTCGGGT CAGCAGTAGT 

40 M51 GCCACGAAAC AGAAAGTTCT CAGCCCACCT CACATAAAGG CGGATTTCAA 

5D1 TCTTGCTGAC TTTGAGTGTG AAGA AGACCC ATTTGATAAT CTGGAGTTAA 

551 AAACTATTGA TGAGAAGGAA GAGCTGAGAA ATATTCTGGT AGGAACCACT 

fe,01 GGACCCATTA TGGCTCAGTT ATTGGACAAT AACTTGCCCA GGGGAGGCTC 

b51 TGGGTCTGTG TTACAGGATG AGGAGGTCCT GGCATCCTTG GAACGGGCAA 

45 701 CCCTAGATTT CAAGCCTCTT CATAAACCCA ATGGCTTTAT AACCTTACCA 

751 CAGTTGGGCA ACTGTGAAAA GATGTCACTG TCTTCCAAAG TGTCCCTCCC 

fiDl CCCTATACCT GCAGTAAGCA ATATCAAATC CCTGTCTTTC CCCAAACTTG 

651 ACTCTGATGA CAGCAATCAG AAGACAGCCA AGCTGGCGAG CACTTTCCAT 

^□1 AGCACATCCT GCCTCCGCAA TGGCACGTTC CAGAATTCCC TAAAGCCTTC 

50 ^51 CACCCAAAGC AGTGCCAGTG AGCTCAATGG GCATCACACT CTTGGGCTTT 

1DD1 CAGCTTTGAA CTTGGACAGT GGCACAGAGA TGCCAGCCCT GACATCCTCC 

1D51 CAGATGCCTT CCCTCTCTGT TTTGTCTGTG TGCACAGAGG AATCATCACC 

1101 TCCAAATACT GGTCCCACGG TCACCCCTCC TAATTTCTCA GTGTCACA AG 

1151 TGCCCAACAT GCCCAGCTGT CCCCAGGCCT ATTCTGAACT GCAGATGCTG 

55 12D1 TCCCCCAGCG AGCGGCAGTG TGTGGAGACG GTGGTCAACA TGGGCTACTC 

1251 GTACGAGTGT GTCCTCAGAG CCATGAAGAA GAAAGGAGAG AATATTGAGC 

1301 AGATTCTCGA CTATCTCTTT GCACATGGAC AGCTTTGTGA GAAGGGCTTC 

1351 GACCCTCTTT TAGTGGAAGA GGCTCTGGAA ATGCACCAGT GTTCAGAAGA 
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10 



15 



20 



25 



WO 01/98454 

mOl AAAGATGATG 
1M51 TTGAGCTGAA 
1501 GACAATGCTT 
1551 GGCCCTGCCT 
IbOl GAGCCCACCT 
lb51 GGGTTAGAAG 
1701 GCCCTGAGCT 
1751 ACTGTCCTGG 
1B01 CTTCCCACTT 
1A51 TATGTCCTCA 
1^01 GGGGCGGGAG 
n51 TTCCCCTGAG 
2D01 AGATTCTTCC 
BD51 TTAACACTGG 
B1D1 TATGGGGCCC 
2151 ACCCCAGCCT 
2201 CAGGGTTTTA 
2251 CCTTCCCAGC 
2301 CAGCACTAAC 
2351 CTGCTTTAGG 
2M01 TCTGGTTTGT 
2M51 GGATATACAG 
2501 AGAGAGAACT 
2551 A 



GAGTTTCTTC 
AGACATTAAG 
TGGAAGACCT 
AGGCCCTGCC 
GTGGGGAAAG 
GTCAGGTGTG 
GGGGAGG1GG 
CTCCTTCCGT 
CAGCCCTCCG 
GCTGAAGCCT 
GGCCAGACTC 
ACTGGTTGAC 
AGGGTTTTAT 
TTCTGCAATA 
AGAGTTTGCC 
GTTTCTTTTG 
GAGCCCCTGC 
ACATTGAATG 
TCCACCTCTG 
ATGACACAAT 
TTTGTATTAT 
TCTTGAATCT 
ACTAATAAAA 



AGTTAATGAG 
G A AGTTTTGC 
CATGGCTCGG 
GCAGAACCAC 
AGAAGGGGCA 
GAG ACT.GCTC 
GGA AGATTCG 
ATT AAACGCA 
GAGAGACTAC 
GGCCTAGTTG 
AGTGCTGCTG 
TGAACTCCAG 
TTTTTCCCCT 
TCTCTGAGGT 
TTTTCTGCCA 
GCTTGGTTTG 
TCT AGGAA AC 
GGTAAGCAGA 
TTCTCCTTGA 
GAATAACACC 
GTTGTACATC 
A A A ATAATTT 
ATCT AAAAGG 



CAAATTTAAG 
TATTACACAA 
GCAGGAGCCA 
CATCCCTGGG 
GCTTCCGGAT 
GCCAGTCTCT 
GGCATGTGAG 
TTTGCATTTT 
CCTAGTCTTT 
CTGAGAGGGG 
TGGAGCTAGG 
TCAAGTTGAG 
CCTAACAAAG 
GCAAAGAATG 
GGCAGTCACC 
GACCACAGTC 
AGTTTAAGAA 
CAGGCCATGA 
ACAGCTTCCC 
TAGTCATAGA 
ATTAAAGATC 
GCTAACTATT 
TAAAAAAAAA 



PCT/IB01/02050 

GAGATGGGCT 
CAATGACCAG 
GCTGAGACCA 
AGGCCCTGCA 
TTTCTTTTGG 
GTGAGCCTAG 
TGCCCCCAGA 
GAGAAGTGTC 
CTGGGGTGTT 
CTGGGGAGAT 
TGCTTCCCCC 
TTCAAGTGAA 
TCTCATAGTG 
CACTTTTCCC 
ACGCTTCCCT 
CTCTGCTACC 
ATCATTGGCC 
TTTAGTTGGC 
CTCCAGCCCA 
AATCAGTCTC 
TAAATACAAA 
TTGATTCTTC 
AAAAAAAAAA 
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BLAST Results 



30 No BLAST result 



Medline entries 



No Hedline entry 



40 



Peptide information for frame 1 



ORF from 37 bp to 1SHB bp} peptide length: 502 
Category: putative protein 
45 Classification: unclassified 



1 MASKKLGADF 

51 EV0YDFSLEK 

101 KflSFSKTHST 

50 151 KADFNLADFE 

201 PRGGSGSVLd 

251 KVSLPPIPAV 

301 SLKPSTdSSA 

351 EESSPPNTGP 

55 MOl NMGYSYECVL 

M51 (2CSEEKMNEF 

501 AS 



HGTFSYLDDV 
KTIEWAEEIK 
ATNPPPINPI 
CEEDPFDNLE 
DEEVLASLER 
SNIKSLSFPK 
SELNGHHTLG 
TVTPPNFSVS 
RANKKKGENI 
LtfLMSKFKEN 



PFKTGDKFKT 
KIEEAEREAE 
LASLCHNSIL 
LKTIDEKEEL 
ATLDFKPLHK 
LDSDDSNflKT 
LSALNLDSGT 
flVPNNPSCPd 
EGILDYLFAH 
GFELKDIKEV 



PAKVGLPIGF 
CKIAEAEAKV 
TPTRVSSSAT 
RNILVGTTGP 
PNGFITLP<2L 
AKLASTFHST 
EMPALTSSfifl 
AYSELtfllLSP 
G(2LCEKGFDP 
LLLHNNDflDN 



SLPDCLflVVR 
NSKSGPEGDS 
KtfKVLSPPHI 
IFIAflLLDNNL 
GNCEKMSLSS 
SCLRNGTFtfN 
PSLSVLSVCT 
SERflCVETVV 
LLVEEALEHH 
ALEDLNARAG 
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BLASTP hits 

5 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_10nlD frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_10nlD -. frame 1 

Report for DKFZphtes3_10nlO . 1 

15 

ELENGTH3 5D2 
intO 55063.76 
CpI3 5. 02 

20 EBLOCKSJ PRD10S3D 
EBLOCKSU BL0130bB 
EKW3 All_Alpha 

EKU3 L0W_C0I1PLEXITY fl-57 V. 

25 

SE<3 MASKKLGADFHGTFSYLDDVPFKTGDKFKTPAKVGLPIGFSLPDCLdVVREVGYDFSLEK 

SEC xx 

PRD cccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhcccchh 

30 SEC KTIEWAEEIKKIEEAEREAECKIAEAEAKVNSKSGPEGDSKMSFSKTHSTATnPPPINPI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhh 

SEfl LASLflHNSILTPTRVSSSATKGKVLSPPHIKADFNLADFECEEDPFDNLELKTIDEKEEL 
35 SEG 

PRD hhhhcccccccccccccchhhhhcccccchhhhhcccccccccccccccccchhhhhhhh 

SE<2 RNILVGTTGPIMAt3LLDNNLPRGGSGSVL(3DEEVLASLERATLDFKPLHKPNGFITLP(3L 
SEG 

40 PRD hhhhccccchhhhhhhhcccccccccccchhhhhhhhhhhhhcccccccccccccccccc 

SEfl GNCEKMSLSSKVSLPPIPAVSNIKSLSFPKLDSDDSN(2KTAKLASTFHSTSCLRNGTF<3N 
SEG . 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccc 

45 

SEfl SLKPSTGSSASELNGHHTLGLSALNLDSGTEIIPALTSSflriPSLSVLSVCTEESSPPNTGP 

SEG xxxxxx 

PRD ccccccccccccccccccccceeecccccccccccccccccceeeeeeeccccccccccc 

50 SEt3 TVTPPNFSVS(3VPNI1PSCP(3AYSEL(3nLSPSER(3CVETVVNriGYSYECVLRAnKKKGENI 
SEG xxxxxx 

PRD cccccccccccccccccccchhhhhhhcccccchhhhhhhccccchhhhhhhhhhccchh 

SE(2 E(3ILDYLFAHG(3LCEK6FDPLLVEEALEriH(3CSEEtcrinEFL(3LriSKFICEriGFELKDIKEV 
55 SEG 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
SE(3 LLLHNND(2DNALEDLNARAGAS 
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SEG 

PRD hhcccccchhhhhhhhhhhccc 



5 (No Prosite data available for DKFZphtes3_10nlO . 1 ) 

(No Pfam data available for DKFZphtes3_10nlQ. 1 ) 
DKFZphtes3_llal7 



10 

group: transmembrane protein 

DKFZphtes3_llal7 encodes a novel HBfi amino acid protein without 
15 similarity to known proteins. 

The novel protein contains 2 transmembrane regions and one 
leucine zipper. The protein is ubiquitously expressed with higher 
abundance in stomach-* brain and testis- 
20 No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
25 testicular cells- 



unknown protein 

30 Pedant: TRANSMEMBRANE 2 

perhaps differential polyadenylation 

Sequenced by diagen 

35 Locus: unknown 

Insert length: 25^1 bp 

Poly A stretch at pos- 2570-1 polyadenylation signal at pos. 25Mfi 

40 

1 CTCTCCTGCG CCCTCTGGAG GAAGTGAGAA 
51 CCGCCTGGTA TCTGGGCTCC AGGCCACCGA 

101 GAGCCCTTAG CACACACCTC CCCCACAGGT 

151 ACCTGCAGCC GTGGCGGTAC GCGCCTGACA 
45 2D1 TCCCAGCCCC GGTGTGTGTC GGAGAAATGG 

251 CCTGCTGATG TACACCAAGT TGTTTGTGGG 

3D1 GCACAGACCT GGTCAGCCCC AAGCACGCGC 

351 AAAGTCTTTG CCCAGCCCAA CCTGGCTGAG 

401 GCTATTCCTG GAGCCAGAGC TGGTCATCCC 
50 451 TCACGGCCCC CACATTCACT GGGAGCTTCC 

501 GTCACTGATG CCTCCTTCAA GGTGAAGAGC 

551 CCAGGACTGC AAGTACACCC CGATGTTTGG 

feOl TCCTGCGCCT CGCTCAGCTC ATCACACAGG 

b51 ATCTCCGACC AGTGTGCGGA GAGCCCGGCT 
55 701 GCTGGGCTTT AGCTCCATGG ACACCA ATGG 

751 TGGACGAGAT GGGGCAAGAC AGTGTCCGGA 

flOl AAGGCCCTGG AGTACCTGCG CCAGATATTC 

S51 CAGGCAGTTC ACACTCGCCT TGGGCACCAC 

-281- 



GAGTCAGTCC 
GTATTTGGCC 
CCTGGAGATG 
AGCAGGCTCC 
GCACCCTTTG 
CTTTCTGAAC 
TCATGGTGTT 
ATGATTCAGA 
CCACCGCCAG 
TGTCACCCTG 
CACGTCTACA 
GCCCGAGGCC 
CCAAACACAC 
GGCCACTCCT 
CTCCTACACA 
AGACAGATGA 
CGGCTCAGCG 
CCAGGATGAG 



CACCCAGCTG 
CCCAGCCACG 
TGGCTGAGCT 
GGGCAGCGAC 
TCCAGGAGAA 
CGCGCGCTCC 
CCGAGTGGCC 
AAGGTGAGCA 
CACCGACTCT 
GCCACCAGCG 
GCCTGGAGGG 
CGCACCCTGG 
AGCCAAGTCC 
TCCTCTCATG 
GCCAACGACC 
ATACCTGGAG 
AAGCGCAGCT 
AATGGAAAAA 
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=101 AGCAACTCCC CGACTGCATC GTGGGTGAGG ACGGACTCAT CCTTACGCCC 

^51 CTGGGGCGGT ACCAGATCAT CAATGGGCTG CGAAGGTTTG A AATTGAGTA 

1001 CCAGGGGGAC CCGGAGCTGC AGCCCATCCG GAGCTATGAG ATCGCC AGCT 

1051 TGGTCCGCAC ACTCTTTAGG CTGTCGTCTG CCATCAACCA CAGATTTGCA 

5 1101 GGACAGATGG CGGCTCTGTG TTCCCGGGAT GACTTCCTCG GCAGCTTCTG 

1151 TCGCTACCAC CTCACAGAAC CTGGGCTGGC CAGCAGGCAC CTGCTGAGCC 

1501 CTGTGGGGCG GAGGCAGGTG GCCGGCCACA CCCGCGGCCC CAGGCTCAGC 

1B51 CTGCGCTTCC TGGGCAGTTA CCGGACGCTG GTCTCGCTGC TGCTGGCCTT 

1301 CTTCGTGGCC TCTCTGTTCT GCGTCGGGCC CCTCCCATGC ACGCTGCTGC 

10 1351 TCACCCTGGG CTATGTCCTC TACGCCTCTG CCATGACACT GCTGACCGAG 

1M01 CGGGGGAAGC TGCACCAGCC CTGAAGGTGT CAGCTGCCTT CAGAGCAGGC 

1451 TGGAGGGATT TGCCACACAG CCCCACCCTT GGGCTGAGAG GACCTGGGAA 

1501 GCCCCTCCAG GAGGGAACAC GGTCATCCTC GGGGTTCTGG AGCGGGGTTC 

1551 CTGCAGCCGC AGAGGCATCT GGAGGAAACG CAACCAAGAA AGGAAGGCAG 

15 IbOl GTGGGCCCCA GCAAAGGAGT AGCTGCCAGG GCTCAACAGC TACGCTCTGT 

lb51 GACAGCGCAG AGCTCAGCGC CGGCCTTTCC CTCCCTCCGC CAAGGACTCA 

1701 CGGCCAAGCC AGCTCTCGGG GCCTTTTTTC CAGTGCCCAT TTGGCTACTC 

1751 TGCTGCACCA AGCTTGGGAG CCAGCCTGCC AACAGCCACC TGGGCCTGGC 

lflOl CTCCCCACTG GCTGGCCTTG AGGTTGGCAG AGTGGGTTGT GGCGCTTCCT 

20 1A51 CTCTCTGTGT GGGACCAGGA CAGTGGCTT A AGTCTCCACT CCAGGAAAGA 

1*101 ATCAAAGTTT CTAGAGTTGT GAGAAAACCA GAGAGTGGCT GTCCTGATTC 

1T51 TTCACTGTGA GGGGCGTTCT TCATGTTCTC CCAGCTGTTC CAAGACTGGG 

2001 CCGTAGAATT CCATGTTTCA GGAGCCTAAG ACCCTCCCAG AGCCCAGGGG 

5051 CTTCACCGCA GACCCCAAGC CATTG AGCAC ATCACCC AAA GCAGTGGCCA 

25 5101 ACATCGCGGA CCCCTGTGCC TTGTCACAGA TGGGTGCTGG TCCTCAGGCG 

5151 TTGGGGACAC TGCTGGGTCG ATGGGGTCGG ATTCTGCCAG TTTCTGCTCT 

5501 GCAGCCAAAG ATGGTCAGAA GCATTGTCAC TTCAGTAACA TCAAGTGCTC 

5551 AAAGACATGG CAACCGTTCA GTGGTACTTA AGTATTCAAA ATATACAACT 

5301 ACAGATTCTC TGACAGAAAC CAGCACGGGG TCTTCACCTT CATTCACCCC 

30 5351 ACAGGCGACA TGCGAGGGAG AACAGCATCT CAGTGGTGAT TTCCAAACCA 

5401 AGCCTTTGTT TTCGGTGTGG GGTTTTGGGG GTTTGCTTTA ATGTTTTTGA 

5451 A ATTGTAAAT GTTGGGCTTT TTATTTTGAT GTAAACTGAG AATAATGGCA 

5501 TTTTAGGGCC TGTGACCAAA AATGAAGCTT GTAACGACCA TGGATCTGAA 

5551 TAAACATGTC CTTGCTTCTG AAAAAAAAAA AAAAAAAAAA A 

35 

. BLAST Results 



40 Entry AF055134 from database EF1BLNEU : 

Homo sapiens clone 535A5 mRNA sequence. 

Score = S7b5i P = S.^e-554., identities = 1155/115b 

3' UTR 



45 



Medline entries 



50 No riedline entry 



Peptide information for frame 3 
55 

ORF from 13fi bp to 1451 bp=i peptide length: 45fl 
Category: putative protein 
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Classification: Transmembrane proteins unclassified 
Prosite motifs: LEUCINE_ZIPPER (M0M-MS5) 

5 1 MULSYLtfPUR YAPDKdAPGS DSfiPRCVSEK WAPFVtJENLL MYTKLFVGFL 

SI NRALRTDLVS PKHALMVFRV AKVFA(3PNLA EMIfiKGEflLF LEPELVIPHR 
101 flHRLFTAPTF TGSFLSPUPP AVTDASFKVK SHVYSLEGUD CKYTPMFGPE 
1S1 ARTL VLRLAO LIT<3AKHTAK SISDtfCAESP AGHSFLSULG FSSMDTNGSY 
SOI TANDLDEMGfl DSVRKTDEYL EKALEYLRfll FRLSEA<3LR<2 FTLALGTTQD 
10 551 ENGKKCLPDC IVGEDGLILT PLGRY(3IING LRRFEIEYflG DPELdPIRSY 

301 EIASLVRTLF RLSSAINHRF AGflMAALCSR DDFLGSFCRY HLTEPGLASR 
3S1 HLLSPVGRRt? VAGHTRGPRL SLRFLGSYRT LVSLLLAFFV ASLFCVGPLP 
101 CTLLLTLGYV LYASAMTLLT ERGKLHflP 

15 

BLASTP hits 

No BLASTP hits available 

20 

Alert BLASTP hits for DKFZphtes3_llal?-, frame 3 
No Alert BLASTP hits found 
25 Pedant information for DKFZphtes3_llal7-i frame 3 

Report for DKFZphtes3_llal7 . 3 

30 

ICLENGTH3 MBfl 

EpI3 a.^E 
EPROSITEJ LEUCINE_ZIPPER 1 
35 IEKU3 TRANSMEMBRANE 5 

EKU3 LObI_COMPLEXITY 7-lfl '/. 

SEO MULSYLCPIilRYAPDKflAPGSDSiSPRCVSEKllAPFVflENLLMYTKLFVGFLNRALRTDLVS 
40 SEG 

PRD cccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhccc 
MEM 

SE<3 PKHALMVFRVAKVFAfiPNLAEMIflKGEflLFLEPELVIPHRCHRLFTAPTFTGSFLSPUPP 
45 SEG 

PRD cchhhhhhhhhhhhcccchhhhhhhccceeeccceeeccccccccccccccccccccccc 
MEM 

SEfl AVTDASFKVKSHVYSLE6(2DCKYTPMF6PEARTLVLRLAi?LITfiAKHTAKSISI>(2CAESP 
50 SEG •••••••••••••••■••■■•••••••■••••••••••....»•,.... 

PRD cccccccccccceeeccccccccccccccchhhhhhhhhhhhhhhhccccccccccccc^ 
MEM 

SE<2 AGHSFLSblLGFSSIIDTNCSYTANDLDEnGflDSVRKTDEYLEICALEYLRfilFRLSEAiSLRtf 
55 SEG •••■••••••••••»•-••••••••••.......,». 

PRD ccceeecccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhh^ 
MEM . 
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SE<3 FTLALGTTflDENGKK(3LPDCIVGEDGLILTPLGRY(2IINGLRRFEIEYciGDPELt3PIRSY 

SEG 

PRD hhhhhhccccccccccccceeecccccccccccceeeecchhhhheeecccccccccchh 

MEM 

SE<2 EIASLvRTLFRLSSAINHRFAG<2MAALCSRDDFLGSFCRYHLTEPGLASRHLLSPVGRR(2 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeeeeeccccchhhhhcccccccc 

MEM 

SEfl VAGHTRGPRLSLRFLGSYRTLVSLLL AFFVASLFCVGPLPCTLLLTLGYVLYASAMTLLT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccchhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhh 

MEM riMMMMMMIIMMriMMriliriM MMMMMMMMMMMMMMMMM • 



15 



SE(3 
SEG 
PRD 
MEM 



ERGKLHCP 



hhhccccc 



20 



Prosite for DKFZphtes3_llal7 . 3 



25 



PSDDDS , 1 



MOM->MEb 



LEUCINE_ZIPPER 



PDOCDODE"} 



(No Pfam data available for DKFZphtes3_Hal7.3) 
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5 group: signal transduction 



DKFZphtes3_llcB2 encodes a novel Mfl2 amino acid protein with 
partial similarity to mouse PC32b- 

The novel protein contains UD-repeats- UD-repeat proteins are 
known as regulatory elements in a large variety of pathways- The 
repeats form a propeller like strcturei which serves as a 
platform for protein/protein interaction - The new protein is 
ubiquitously expressed-! indicating that it takes an essential 
regulatory function in the cell- 

The new protein can find application in modulating/blocking of 
regulatory pathways • 

20 

similarity to mouse PCBBb 
perhaps complete cds- 

contains WD-Repeats : cf- BLASTX-S37t c 5H 
25 perhaps differential polyadenylation 



10 



15 



Sequenced by (3iagen 



30 



Locus: /map^lqEa-H-SM-S" 
Insert length: 1^52 bp 

Poly A stretch at pos- lT32-» polyadenylation signal at pos- niE 



35 1 GAAGCA AGTG AGGTTGCACA AAGCAATAGA GGACGAGGAA GATCTCGACC 

SI CAGAGGTGGA ACAAGTCAAT CAGATATTTC AACTCTTCCT ACGGTCCCAT 

1D1 CAAGTCCTGA TTTGGAAGTG AGTGAAACTG CAATGGAAGT AGATACTCCA 

151 GCTGA ACAAT TTCTTCAGCC TTCTACATCC TCTACAATGT CAGCTCAGGC 

201 TCATTCGACA TCATCTCCCA CAGAAAGCCC TCATTCTACT CCTTTGCTAT 

40 251 CTTCTCCAGA TAGTGAACAA AGGCAGTCTG TTGAGGCATC TGGACACCAC 

301 ACACATCATC AGTCTGATTC TCCTTCTTCT GTGGTTAACA AACAGCTCGG 

351 ATCCATGTCA CTTGACGAGC AACAGGATAA CAATAATGAA AAGCTGAGCC 

HQ1 CCAAACCAGG GACAGGTGAA CCAGTTTTAA GTTTGCACTA CAGCACAGAA 

451 GGAACAACTA CAAGCACAAT AAAACTGAAC TTTACAGATG AATGGAGCAG 

45 501 TATAGCATCA AGTTCTAGAG GAATTGGGAG CCATTGCAAA TCTGAGGGTC 

551 AGGAGGAATC TTTCGTCCCA CAGAGCTCAG TGCAACCACC AGAAGGAGAC 

bOl AGTGAAACAA AAGCTCCTGA AGA ATCATCA GAGGATGTGA CAAAATATCA 

bSl GGAAGGAGTA TCTGCAGAAA ACCCAGTTGA GAACCATATC AATATAACAC 

7D1 AATCAGATAA GTTCACAGCC AAGCCATTGG ATTCCAACTC AGGAGAAAGA 

50 751 AATGACCTCA ATCTTGATCG CTCTTGTGGG GTTCCAGAAG AATCTGCTTC 

601 ATCTGA AAAA GCCA AGGAAC CAGAAACTTC AGATCAGACT AGCACTGAGA 

A51 GTGCTACCAA TGAA AATAAC ACCAATCCTG AGCCTCAGTT CCAAACAGAA 

R01 GCCACTGGGC CTTCAGCTCA TGAAGAAACA TCCACCAGGG ACTCTGCTCT 

^51 TCAGGACACA GATGACAGTG ATGATGACCC AGTCCTGATC CCAGGTGCAA 

55 10D1 GGTATCGAGC AGGACCTGGT GATAGACGCT CTGCTGTTGC CCGTATTCAG 

1051 GAGTTCTTCA GACGGAGAAA AGAAAGGAAA GAAATGGAAG AATTGGATAC 

11D1 TTTGA ACATT AGAAGGCCGC TAGTA AAAAT GGTTTATAAA GGCCATCGCA 

1151 ACTCCAGGAC AATGATAAAA GAAGCCAATT TCTGGGGTGC TAACTTTGTA 
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12D1 ATGAGTGGTT 
1251 TGAGCATTTG 
13D1 AGCCACATCC 
1351 ATAA AGATCT 
5 1MD1 TGCTGATGAA 
m51 ACACCATTAC 
15D1 AATCATATCC 
1551 TCAAGAGAAT 
IbOl ACTTAAATGT 

10 lfc»51 ACAGAGCTTT 
1701 TTTTTGGGAT 
1751 GGAGATTGTA 
IfiDl TGTATGAGGA 
IflSl AAATGTGGGA 

15 l^Dl AATGTTGGTT 
1T51 AA 



BLAST Results 

20 

Entry HS702J1 C ] from database EMBL : 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 
7D2J1 C J 

25 Score = 20*43-. P = 5-fie-252-, identities = H25/4M5 
ID exons matching Bp 31b-n32 

Entry HS53bmfl from database EMBL : 
human STS UI-b3M7- 
30 Score = 1203-, P = l-5e-47i identities = 2*47/252 

Entry HS703Hm from database EMBLNEW : 

Human DNA sequence from clone 7D3Hm on chromosome lq23-2-2H-3 
Score = 13D7-. P = Lle-51-, identities = 2b3/2b5 
35 2 exons matching Bp l-31b 



Medline entries 

40 

=i302t3fi3: 

Bergsagel PLt Timblin CR-i Eckhardt L-i Laskov Ri Kuehl DM - 
Sequence and 

45 expression of a murine cDNA encoding PC32b-i a novel 

gene expressed in plasmacytomas but not normal plasma cells- 
Oncogene 

m2 0cti7(10) :2D5^-b4 

50 



Peptide information for frame 1 



55 

ORF from 133 bp to 1576 bp* peptide length: Mfl2 

Category: similarity to known protein 

Classification: Protein management 



CTGACTGTGG 
ATGCTTCTGG 
GTTTGACCCA 
GGTCACCATT 
GTTATAACTC 
AGTTCCAGCC 
GAGCTGACCG 
GAAAATGAGG 
TCTGAAATTT 
AGTGCAATTT 
AACCTAACAT 
TAAAACAAAA 
GTGCTAGA AA 
GCTTGGATCA 
CAAATAAATT 



CCACATTTTC 
AAGCTGAT AA 
ATTTTAGCCT 
AGAAGAGTC A 
GAAACGAACT 
TCTTTCATGT 
GTTGGAGGGT 
ATGAGGAATA 
GTATAAGACA 
TAAGGTTATG 
TGGTTTGGAA 
CTAGCAGAAT 
ATGCAAAGTG 
ATGTTGAAGA 
TCTACACTTG 



ATCTGGGATC 
TCATGTGGTA 
CATCTGGCAT 
AGGATTTTTA 
CATGCTGGAA 
TGAGGATGTT 
GACAGATCAG 
ATAAACTCTT 
TTTATTATAT 
GTTTTTGGAG 
TGATTGTGTG 
GTTTTTAAAA 
CAATATTTTC 
ATAATTTTCA 
CCAAAAAAAA 



GGCACACTGC 
AACTGCCTGC 
AGATT ATGAC 
ACCGAAAACT 
GAAACTAGAA 
GGCTTCACTT 
AAGGCTCTGG 
TTTGGCAAGC 
TTTTTTCTTT 
TTTTTCCCTT 
CATGAATTTG 
CTTTTTGCCG 
CCTAACCTTC 
TCATAGTGAA 
AAAAA AAAAA 



-286- 



WO 01/98454 PCT/IB01/02050 
Prosite motifs: MYB_1 (410-416) 

1 MEVDTPAE<2F LflPSTSSTMS AcJAHSTSSPT ESPHSTPLLS SPDSEtJRflSV 

5 51 EASCHHTHHC3 SDSPSSVVNK <2LGSMSL]>E<2 (2DNNNEKLSP KPGTGEPvLS 

101 LHYSTEGTTT STIKLNFTDE WSSIASSSRG IGSHCKSEGfl EESFVPflSSV 

151 (2PPEGDSETK APEESSEDVT KYOEGVSAEN PVENHINITfl SDKFT AKPLD 

E01 SNSGERNDLN LDRSCGVPEE SASSEKAKEP ETSDflTSTES ATNENNTNPE 

E51 P(3F(3TEATGP SAHEETSTRD SAL<2DTDDSD DDPVLIPGAR YRAGPGDRRS 

10 301 A VARIcJEFFR RRKERKEMEE LDTLNIRRPL VKMVYKGHRN SRTMIKEANF 

351 UGANFVMSGS DCGHIFIUDR HTAEHLMLLE ADNHVVNCLfl PHPFDPILAS 

401 SGIDYDIKIU SPLEESRIFN RKLADEVITR NELMLEETRN TITVPASFML 

451 RMLASLNHIR ADRLEGDRSE GSG(2ENENED EE 



15 



20 



40 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_llc2E -. frame 1 



TREMBLNEli):HSDbb31_l gene: n H3Eb"i Human (H3Eb) mRNA, complete 
cds-, N 

25 = 1-. Score ■ 576-. P = 4e-SE 

PIR:S37bT4 gene PC3Eb protein - mouse-i N = It Score = 2b5i P = 
E.le-ED 

30 PIR:T05b7b hypothetical protein FEDM13.4D - Arabidopsis thaliana-i 
N = 

1-. Score = EMDi P = b-3e-lfl 

35 >TREMBLNEU:HSDbb31_l gene: "H3Eb"n Human (H3Eb) mRNA-, complete 
cds- 

Length = 5^7 



HSPs: 

Score = E76 (m.7 bits)-. Expect ■ M-Oe-EEi P = 4 • De-EE 
Identities = b3/146 (4EZ)-. Positives = ^4/146 <b35<) 



fluery: 335 YKGHRNSRTMIKEANFUG— 
45 ANFVMSGSDCGHIFIWDRHTAEHLMLLEADNH-VVNCLdP 311 

YKGHRN+ T +K NF+G + FV+SGSDCGHIF+W++ + + + +E P 

VVNCL+P 

Sbjct: 4ES YKGHRNNAT- 

VKGVNFYGPKSEF VVSGSDCGHIFLUEKSSCQIIOFFIEGDKGGVVNCLEP 4flb 

50 

fluery: 3^E HPFDPILASSGIDYDIKIUSPLEESRIFNRKLADEVTTRNELMLEE- 
TRNTITVPASFML 150 

HP P+LA+SG+D+D+KIU+P E+ L D VI +N+ +E + + 

+ S ML 

55 Sbjct: 4fi7 HPHLPVLATSGLDHDvKIblAPTAEASTELTGLKD- 
VIKKNKRERDEDSLHflTDLFDSHML 545 



(Juery: 451 RMLASLNHIR AD RLEGD-RSEGSGtJENENEDE 461 
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L ++H+R R R G G + + DE 

Sbjct: 54b UFL--nHHLRt2RRHHRRWREPGVGATDADSDE 575 



5 Pedant information for DKFZphtes3_llc22-i frame 1 



Report for DKFZphtes3_llc25 - 1 

10 

ELENGTHJ 4fi2 
EMIO 53470-^2 
[pll 4.72 

EHOMOLID PIR:TD4 t lbl hypothetical protein T12J5-1D - 

15 Arabidopsis thaliana 2e-22 

EFUNCATID 30-DT organization of intracellular transport vesicles 
ES- cerevisiae-* YDL145c3 4e-05 

EFUNCATH) Da-D7 vesicular transport (golgi network-i etc.) ES- 

cerevisiae-i YDL145cID 4e-D5 
20 EFUNCAT3 ^ unclassified proteins ES- cerevisiae-. YCLC^wJ 

2e-D4 

ESUPFAMJ WD repeat homology 4e-21 
EPROSITEJ I1YB_1 1 
EKtO Alpha_Beta 
25 EKIO L0lil_C0MPLEXITY 17-01 V. 



MEVI>TPAE(3FL(3PSTSSTnSA(3AHSTSSPTESPHSTPLLSSP])SE(2R(2SVEASGHHTHH{3 

XXXXXXXXXXXXXXXXXXXXX - - 

cccccceeeeecccccceeeeeeccccccccccccceeecccccchhhhhhccccceeec 
SDSPSSVVNK(2L6SnSLDEfiflDNNNEKLSPKP6TGEPVLSLHYSTEGTTTSTIKLNFTDE 



ccccceeeeecccccccccccccccccccccccccccceeeeccccccccccceeeeccc 

WSSIASSSRGI6SHCKSEG(3EESFVP(2SSVfiPPEG])SETKAPEESSEPVTICYCECVSAEN 
• xxxxxxxxxxxx 

cccccccccccccccccccceeeeeccccccccccccccccccccccccccccccccccc 

PVENHINIT(2SDKFTAKPLDSNSGERNDLNLI>RSCGVPEESASSEKAKEPETSD(3TSTES 

X X X X X XXX xxxxx X • • • • xxxxx 

ccceeeeeecccccccccccccccccccccccccccccchhhhhhhhccccccccccccc 

ATNENNTNPEP(3F(2TEATGPSAHEETSTRDSALr3DTDDSPI>I>PVLIPGARYRAGPGDRRS 

XXXXXXXX XXX XXXXX . . 

cccccccccccceeeeeccccccccccccccccccccccccccccccccccccccccchh 
AVARIfiEFFRRRKERKEnEELDTLNIRRPLVKMVYKGHRNSRTniKEANFUGANFVMSGS 

XXXXXXXXXXXXXX 

hhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeeccccccceeeeccccccceeeeccc 
DCGHIFIUPRHTAEHLULLEADNH VVNCL6PHPFDPILASS6IDYDIKILJSPLEESRIFN 



ccceeeeeecchhhhhhhhhcccceeeecccccccceeecccccceeeecccchhhhhhh 
RKLADEVITRNELMLEETRNTITVPASFMLRML ASLNHIRADRLEGDRSE6S6(3ENENED 



hhchhhhhhhhhhhhhhhhcceeecchhhhhhhhhhchhhhhhccccccccccccccccc 
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SE<2 
SEG 
30 PRD 

SE0 
SEG 
PRD 

35 

SE(2 
SEC 
PRD 

40 SEfl 
SEG 
PRD 

SE<2 
45 SEG 
PRD 

SE0 
SEG 
50 PRD 

SEG 
SEG 
PRD 

55 

SE(2 
SEG 
PRD 



WO 01/98454 



PCT/IB01/02050 



SE<2 EE 

SEG - - 

PRD cc 

5 



Prosite for DKFZphtes3_llc22 - 1 
10 PSD0037 11D->m e i HYB_1 PD0C0DD37 



(No Pfam data available for DKFZphtes3_llc22 • 1) 
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DKFZphtes3_lld21 



5 group: signal transduction 



10 



15 



30 



DKFZphtes3_lld21 encodes a novel T22 acid protein and contains 
the full coding sequence of the human Nedd-4-like ubiquitin- 
protein ligase- 

The novel protein contains four UU domains- The bJU/rspS/LIUP 
domain has been shown to bind proteins with particular proline- 
motifsn and thus resembles somewhat SH3 domains- It is 
frequently associated with other domains typical for proteins in 
signal transduction processes- There is also a ubiquit in-protein 
ligase activity reported- The protein is believed to play an 
important role in protein-degradation pathways- 



The new protein can find application in diagnosis of diseases due 
20 to unnormal protein degradation like muscular dystrophy or 

multiple sclerosis as well as in modulating the half life of 
specific proteins and in expression profiling. 

25 similarity to Nedd-M-like ubiqui tin-protein ligase (Homo sapiens) 
Sequenced by tfiagen 
Locus: unknown 



Insert length: 3362 bp 

Poly A stretch at pos- 33b2i polyadeny lat ion signal at pos- 33M5 



35 1 ATTTTGGGAC ATGGCCACTG CTTCACCAAG GTCTGATACT AGTAATAACC 

51 ACAGTGGAAG GTTGCAGTTA CAGGTA ACTG TTTCTAGTGC CAAACTTAA A 

1D1 AGAAAAAAGA ACTGGTTCGG AACAGCAATA TATACAGAAG TAGTTGTAGA 

151 TGGAGA A ATT ACGAAAACAG CAAAATCCAG TAGTTCTTCT AATCCAAAAT 

2D1 GGGATGAACA GCTAACTGTA AATGTTACGC CACAGACTAC ATTGGA ATTT 

40 251 CAAGTTTGGA GCCATCGCAC TTTAAA AGCA GATGCTTTAT TAGGAAAAGC 

301 AACGATAGAT TTGAAACAAG CTCTGTTGAT ACACAATAGA AAATTGGAAA 

351 GAGTGAAAGA ACAATTAAAA CTTTCCTTGG AAAACAAGAA TGGCATAGCA 

401 CAAACTGGTG AATTGACAGT TGTGCTTGAT GGATTGGTGA TTGAGCAAGA 

M51 AAATAT A ACA AACTGCAGCT CATCTCCAAC CATAGAAATA CAGGAAAATG 

45 501 GTGATGCCTT ACATGAAAAT GGAGAGCCTT CAGCAAGGAC AACTGCCAGG 

551 TTGGCTGTTG AAGGCACGAA TGGAATAGAT AATCATGTAC CTACAAGCAC 

bOl TCTAGTCCAA AACTCATGCT GCTCGTATGT AGTTAATGGA GACAACACAC 

b51 CTTCATCTCC GTCTCAGGTT GCTGCCAGAC CCAAAAATAC ACCAGCTCCA 

701 AAACCACTCG CATCTGAGCC TGCCGATGAC ACTGTTAATG GAGAATCATC 

50 751 CTCATTTGCA CCAACTGATA ATGCGTCTGT CACGGGTACT CCAGTAGTGT 

601 CTGAAGAAAA TGCCTTGTCT CCAAATTGCA CTAGTACTAC TGTTGAAGAT 

fi51 CCTCCAGTTC AAGAAATACT GACTTCCTCA GAAAAC AATG AATGTATTCC 

^01 TTCTACCAGT GCAGAATTGG AATCTG AAGC TAGAAGTATA TTAGAGCCTG 

T51 ACACCTCTAA TTCTAGAAGT AGTTCTGCTT TTGAAGCAGC CAAATCAAGA 

55 1001 CAGCCAGATG GGTGTATGGA TCCTGT ACGG CAGCAGTCTG GGAATGCCAA 

1051 CACAGAAACC TTGCCATCAG GGTGGGAACA AAGAAAAGAT CCTCATGGTA 

1101 GAACCTATTA TGTGGATCAT AATACTCGAA CTACCACATG GGAGAGACCA 

1151 CAACCTTTAC CTCCAGGTTG GGAAAGAAGA GTTGATGATC GTAGAAGAGT 
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1201 TTATTATGTG GATCATAACA CCAGAACAAC AACGTGGCAG CGGCCTACCA 

1551 TGGAATCTGT CCGA AATTTT GAACAGTGGC AATCTCAGCG GAACCAATTG 

13D1 CAGGGAGCTA TGCAACAGTT TAACCAACGA TACCTCTATT CGGCTTCA AT 

1351 GTTAGCTGCA GAAAATGACC CTTATGGACC TTTGCCACCA GGCTGGGAAA 

5 1M01 AAAGAGTGGA TTCA ACAGAC AGGGTTTACT TTGTGAATCA TAACACAAAA 

1451 ACAACCCAGT GGGAAGATCC AAGAACTCAA GGCTTACAGA ATGAAGAACC 

1501 CCTGCCAGAA GGCTGGGAAA TTAGATATAC TCGTGAAGGT GTAAGGTACT 

1SS1 TTGTTGATCA TAAC ACAAGA ACAACAACAT TCAAAGATCC TCGCAATGGG 

IbOl A AGTCATCTG TAACTA AAGG TGGTCCACAA ATTGCTTATG AACGCGGCTT 

10 1LS1 TAGGTGGAAG CTTGCTCACT TCCGTTATTT GTGCCAGTCT AATGCACTAC 

17D1 CTAGTCATGT AAAGATCAAT GTGTCCCGGC AGACATTGTT TGAAGATTCC 

1751 TTCCAACAGA TTATGGCATT AAAACCCTAT GACTTGAGGA GGCGCTTATA 

IflDl TGTAATATTT AGAGGAGAAG AAGGACTTGA TTATGGTGGC CTAGCGAGAG 

1B51 AATGGTTTTT CTTGCTTTCA CATGAAGTTT TGAACCCAAT GTATTGCTTA 

15 1«JD1 TTTGAGTATG CGGGCAAGAA CAACTATTGT CTGCAGATAA ATCCAGCATC 

n51 A ACCATTAAT OCAGACCATC TTTCATACTT CTGTTTCATT GGTCGTTTTA 

2001 TTGCCATGGC ACTATTTCAT GGAAAGTTTA TCGATACTGG TTTCTCTTTA 

2D51 CCATTCTACA AGCGTATGTT AAGTAAAAAA CTTACTATTA AGGATTTGGA 

2101 ATCTATTGAT ACTGAATTTT ATAACTCCCT TATCTGG ATA AGAGATAACA 

20 2151 ACATTGAAGA ATGTGGCTTA GAAATGTACT TTTCTGTTGA CATGGAGATT 

2201 TTGGGAAAAG TTACTTCACA TGACCTGAAG TTGGGAGGTT CCAATATTCT 

2251 GGTGACTGAG GAGA ACAAAG ATGAATATAT TGGTTTAATG ACAGAATGGC 

E3D1 GTTTTTCTCG AGGAGTACAA GAACAGACCA AAGCTTTCCT TGATGGTTTT 

B351 AATGAAGTTG TTCCTCTTCA GTGGCT ACAG TACTTCGATG AAAAAGAATT 

25 2401 AGAGGTTATG TTGTGTGGCA TGCAGGAGGT TGACTTGGCA GATTGGCAGA 

2451 GAAATACTGT TTATCGACAT TATACAAGAA ACAGCAAGCA AATCATTTGG 

2501 TTTTGGCAGT TTGTGAAAGA GACAGACAAT GAAGTAAGAA TGCGACTATT 

2551 GCAGTTCGTC ACTGGAACCT GCCGTTTACC TCTAGGAGGA TTTGCTGAGC 

2b01 TCATGGGAAG TAATGGGCCT CAAAAGTTTT GCATTGA AAA AGTTGGCAAA 

30 2b51 GACACTTGGT TACCAAGAAG CCATACATGT TTTAATCGCT TGGATCTACC 

2701 ACCATATAAG AGTTATGAAC AACTAAAGGA AAAACTTCTT TTTGCAATAG 

2751 A AGAGACAGA GGGATTTGGA CAAGAATGA A TGTGGCTTCT TATTTTGGAG 

2601 GAGCTCTTGC ATTTAAATAC CCCAGCCAAG AAAAATTGCA CAGATAGTGT 

2fi51 ATATAAGCTG TTCATTCTGT ACAGTGAATT TTCCGAACCT CTCAAAGTAT 

35 2^01 GTTTTCCGTT CTTCCACAGA AATATGCAAA ACAGTTCATC CTTTTCTACT 

2^51 TTATTTATTG TTCCCTTGAA ATGACTGACC AGGAAAAAGA TCATCCTTAA 

3D01 ATTTTGAAGC AAGTGAGAGA CTTTATTAAA AATACAT ATA TATCTATATA 

3D51 AACATATATG ATAGTGGCTC TAGTTTTATA GAGCTCC A AG TGTATTAAAC 

31D1 ATGACAGCCA TTCATTCATA AAGATCTGGA TTTGCTTTAC CTTGTTAATA 

40 3151 TTATCTAGGG GAAAAAGTGC AAATTGCTCC ATGTTCTTCT CTCCCTTATG 

32D1 TAACATCTCC TGAGGGTGTT TAGTTGCATG GCTGTTCAGA AAGGTATTAA 

3251 GGGCTTAGGC CAAATCTTAC TTTGAGTATG TTAAAAA AAA AAAAATGCTG 

3301 CTGGCTTTTC TGAAGACAGG TGCTTGAACT TGTCAGTTTG TTTTAAATAA 
3351 ATACAATAGT TGAAAAAAAA AAAAAAAAAA AA 



^7313427: 

Pirozzi Gt McConnell SJi Uveges AJ-i Carter JHi Sparks AB-i Kay BK-i 
Fowlkes DM*n Identification of novel human UU domain-containing 



45 



BLAST Results 



50 



No BLAST result 



Medline entries 
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Peptide information for frame 2 



10 



ORF from 11 bp to 277b bp; peptide length: =J22 
Category: known protein 
Classification: Protein management 
Prosite motifs: UW_D0riAIN_l (355-380) 
15 Utj_I>0MAIN_l C3A7-412) 
UU_D0MAIN_1 (*4b2-4a7) 
WbLJ>0riAIN_l (5D2-527) 

20 1 MATASPRSDT SNNHSGRLtfL (2VTVSSAKLK RKKNUFGTAI YTEVVVDGEI 

51 TKTAKSSSSS NPKblDEflLTV NVTPdTTLEF tfVWSHRTLKA DALLGKATID 

1D1 LK(3ALLIHNR KLERVKE(2LK LSLENKNGI A flTGELTVVLD GLVIEdENIT 

151 NCSSSPTIEI (2ENGDALHEN GEPSARTTAR LAVEGTNGID NHVPTSTLVfl 

201 NSCCSYVVNG DNTPSSPSdV AARPKNTPAP KPLASEPADD TVNGESSSFA 

25 251 PTPNASVTGT PVVSEENALS PNCTSTTVED PPVtfEILTSS ENNECIPSTS 

301 AELESEARSI LEPDTSNSRS SSAFEAAKSR OPDGCriDPVR (3GSGNANTET 

351 LPSGUECRKD PHGRT YYVDH NTRTTTUERP tfPLPPGWERR VDDRRRVYYV 

401 DHNTRTTTWfl RPTMESVRNF E(2!iH3S(3RN<2L QG AlltfflFNflR YLYSASMLAA 

451 ENDPYGPLPP GUEKR VDSTD RVYFVNHNTK TTdblEDPRTd GLdNEEPLPE 

30 501 GWEIRYTREG VRYFVDHNTR TTTFKDPRNG KSSVTKGGPtf IAYERGFRUK 

551 LAHFRYLCflS NALPSHVKIN VSRtfTLFEDS FtfdIMALKPY DLRRRLYVIF 

bOl RGEEGLDYGG LAREUFFLLS HEVLNPHYCL FEYAGKNNYC LtflNPASTIN 

b51 PDHLSYFCFI GRFIAHALFH GKFIDTGFSL PFYKRMLSKK LTIOLESID 

701 TEFYNSLIUI RDNNIEECGL EI1YFSVDMEI LGKVTSHDLK LGGSNILVTE 

35 751 ENKDEYIGLH TEURFSRGVfl EflTKAFLDGF NE VVPL(3b)L(3 YFDEKELEVM 

flOl LCGMflEVDLA DblfiRNTVYRH YTRNSKOIIW FldflFVKETDN EVRMRLLCFV 

fl51 TGTCRLPLGG FAELF1GSNGP (2KFCIEKVGK DTUILPRSHTC FNRLDLPPYK 
TDl SYEflLKEKLL FAIEETEGFG <2E 

40 

BLASTP hits 

No BLASTP hits available 

45 

Alert BLASTP hits for DKFZphtes3_lld21-, frame 2 
No Alert BLASTP hits found 
50 Pedant information for DKFZphtes3_ lld21-i frame 2 



55 



Report for DKFZphtes3_lld21 • 2 



ELENGTH3 T25 

(EMU ID 105b5D.5S 

EpII 5. bO 
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WO 01/98454 PCT/TB01/02050 

EHOFIOLID TREflBL : HSU^bllB,.! product: "UUPl"^ Homo sapiens 

Nedd-4-like ubiqui tin-protein ligase UWP1 mRNA-i partial cds. D-D 
EFUNCATJ 30-02 organization of plasma membrane ES. cerevisiae-, 
YER125wJ le-m^ 

5 EFUNCATID 11. 01 stress response DCS - cerevisiae-, YER125wID le- 

EFUNCATIB Db-13-Dl cytoplasmic degradation IS- cerevisiae-. 
YER125w3 le-l^ 

EFUNCAT3 03. ID sporulation and germination ES- cerevisiaen 
10 YER125wJ le-lMI 

EFUNCATJ 0L.Q7 protein modification (glycolsylationn acylation-. 

myristy lation-i palmitylation-, f arnesy lation and processing) 
ES. cerevisiae-, YER125wD le-lMT 

EFUNCATIB 03-22 cell cycle control and mitosis ES. cerevisiae-, 
15 YDR457wJ le-7fi 

EFUNCATIB unclassified proteins ES- cerevisiae-, YJR03bcJ 

7e-3T 

EFUNCATJ 3D. 03 organization of cytoplasm ES- cerevisiae-, 
YKLOlOcH fle-21 

20 EFUNCAT3 30-1D nuclear organization ES. cerevisiae-, YKL012w3 
be-05 

EFUNCAT J DM. 05*03 mrna processing (splicing) ES. cerevisiae-. 

YKL012wJ be-05 

EFUNCAT J 30-01 organization of cell wall ES- cerevisiae-, 
25 YIROncD 3e-D4 

EFUNCATJ 3D • TD extracellular/secretion proteins ES. cerevisiae-i 
YIRDncJ 3e-0H 

EFUNCAT1 01-05.01 carbohydrate utilization ES - cerevisiae-, 
YIROncJ 3e-Oi4 
30 EBL0CKS3 BP037MLE 
EBLOCKSH BPD37blG 

EBL0CKSJ BLDDSmE Fibrinogen beta and gamma chains C-terminal 

domain proteins 

EBL0CKSD PR0D731B 
35 EBL0CKSJ BPDISttC 

EBL0CKSD BLOUSE Uld/r sp5/UUP domain proteins 

EBL0CKS3 PRDD l 4D3B 

EBL0CKS3 PR00403A 

EBL0CKS3 PF00b32B 
40 EBL0CKS3 PF00b32A 

EEC3 h.3.2.n Ubiquitin — protein ligase le-151 

EPIRKIO ligase le-151 

EPIRKIO transmembrane protein 2e-37 

EPIRKIO leucine zipper 2e-2fl 

45 ESUPFAFO lilbl repeat homology le-151 

ESUPFAMD UD repeat homology 2e-26" 

ESUPFAI11 ubiquitin ligase homolog le-151 

EPR0SITE3 WU_D0riAIN_l ^ 

EPFAMJ lilU/rsp5/WWP domain containing proteins 

50 EPFAM3) C2 domain 

EKU3 Alpha_Beta 

EKIO L0U_C0MPLEXITY 1-M1 X 

55 SE<2 FWDI1ATASPRSI>TSNNHSGRL(2L(2VTVSSAKLKRKKNlilFCTAIYTEVVVDGEITKTAKSS 

SEG 

PRD ccccccccccccccccccceeeeeehhhhhhhhhhhhccccceeeeeeeccccceeeecc 
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WO 01/98454 



PCT/IB01/02050 



SEO 
SEG 
PRD 

5 SE(2 
SEG 
PRD 

SEC 
10 SEG 
PRD 

SE(2 
SEG 
15 PRD 

SE<2 
SEG 
PRD 

20 

SE<2 
SEG 
PRD 

25 SEfl 
SEG 
PRD 

SE(3 
30 SEG 
PRD 

SEfl 
SEG 
35 PRD 

SE<2 
SEG 
PRD 

40 

SE<2 
SEG 
PRD 

45 SE<2 
SEG 
PRD 

SE<3 
50 SEG 
PRD 

SE(3 
SEG 
55 PRD 

SE<2 
SEG 



SSSNPKUDE(2LTVNVTP(3TTLEFflVUSHRTLKADALLGKATIDLK(3ALLIHNRKLERVKE 
ccccccccceeeeeccccceeeeeeecchhhhhhhhhhhhhhhhhhhhhhhchhhhhhhh 
<3LKLSLENKNGIA(2TGELTVVLDGLVIE<3ENITNCSSSPTIEI<2ENGDALHENGEPSART 
hhhhhhcccccccccceeeeeecceeeeeeeccccccccceeeecccccccccccccchh 
TARLAVEGTNGIDNHVPTSTLV<3NSCCSYVVNGDNTPSSPSt2VAARPKNTPAPKPLASEP 
hhhhhhcccccccccccccccccccccccccccccccccccccccccccccccccccccc 
ADDTVNGESSSFAPTDNASVTGTPVVSEENALSPNCTSTTVEDPPVdEILTSSENNECIP 
cccccccccccccccccceeeccccccccccccccccccccccccccccccccccccccc 
STSAELESEARSILEPDTSNSRSSSAFEAAKSR(2PDGCMDPVR<2(3SGNANTETLPSGIi)E<2 
ccccccccceeeeccccccccccccccccccccccccccccccccccccccccccccccc 
RKDPHGRTYYVDHNTRTTTWERPfiPLPPGUERRVDDRRRVYYVDHNTRTTTUflRPTIIESV 



ccccccceeeecccccccccccccccccccccccccccceeeeecccccccccccccccc 
RNFEl2U«JSflRNi3Li3GAri(2(3FN(2RYLYSASriLAAENDPYGPLPPGWEKRVDSTDRVYFVNH 
hhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeccccceeeeec 
NTKTTflUEDPRTflGLCNEEPLPEGUEIRYTREGVRYFVDHNTRTTTFKDPRNGKSSVTKG 
ccceeeecccccccccccccccccceeeeecccceeeeeccceeeeeccccccccccccc 
GPl2IAYERGFRUKLAHFRYLCfiSNALPSHVKINVSRflTLFEDSF(2(2inALKPYDLRRRLY 
cccccchhhhhhhhhhhhhhhhhcccccceeeeehbhhhhhhhhhhhhhhcchhhhhhhh 
VIFRGEEGLDYGGLAREUFFLLSHEVLNPMYCLFEYAGKNNYCLfllNPASTINPDHLSYF 
hhhccccccccccchhhhhhhhhhhccccccceeeeecccceeeeecccccccccceeee 

cfigrfiamalfhgkfidtgfslpfykrhlskkltikdlesidtefynsliuirdnniee 

hhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhcccccccchhhhhheeeeeccccc 
CGLEMYFSVDNEILGKVTSHDLKLGGSNILVTEENKDEYIGLMTEURFSRGV<3E<3TKAFL 
chhhhhhhhhccccceeeeeeeccccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhh 
DGFNEVVPLflULl2YFDEKELEVnLCGn(3EV]>LADU(3RNTVYRHYTRNSK(3IIIilFUd}FVKE 
hhhhhcccccchhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhh 
TDNEVRMRLLflFVTGTCRLPLGGFAELMGSNGPOKFCIEKVGKDTULPRSHTCFNRLDLP 

hchhhhhhhhhhhccccccccccceeeecccccceeeeeecccccccccccccccccccc 
PYKSYE(3LKEKLLFAIEETEGFG(3E 
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PRD cccchhhhhhhhhhhhhhccccccc 
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Prosite for DKFZphtes3_lld21 . 2 



PS0115S 
PS0115T 
PSOllST 
10 PSDllS^ 



35fl->3fiM 
SDS->531 



Ub)_D0MAIN_l 
UU_D0nAIN_l 
bJb)_D0nAIN_l 
WU_D0riAIN_l 



PDOCSOD20 
PD0C5DDB0 
PD0C50020 
PD0CSD02O 



15 



25 



30 



Pfam for DKFZphtes3_Hd21 . 2 



HMM_NAME C2 domain 



Hiin 

20 *LtVrIIeARNLIdkM]>MnGf SDPYVKVdMdPdpkDtkKWKTkTildNN • GL 

L V++ +A+ +K++++G+ Y +V TKT 

+++ + 

(3uery 53 L(3VTVSSAKLKRKKNWFGTA-IYTE VVVDGE 

ITKTAKSSSSS b3 



HMM NPVUNEEeFvFedlPyPdlqrkMLRFaVtiJDliJDRFSRBDFIGHCi* 

NP E + + + + + + L + F + VU + ++ + ++G ++ 

<2uery NPKUD-EflLTVN VTPtfTT — LEFtfVWSHRTLK AD ALLGKAT 

1D1 



HMM__NAME UW/rsp5/UhJP domain containing proteins 

35 HMfl *LPsGUEeHUDpsGRpUYYWNHETkTT(2UEpP* 

LPSGUE+++DP GR + YY++H+T+TT+UE+P 
duery 35M LPSGb)E(2RKDPHGRT-YYVDHNTRTTTliJERP 333 

50. 3flb MIS 1 31 dkf zphtes3_lld21 . 2 similarity to 

40 Nedd-M-like ubiquitin-protein ligase (Homo sapiens) 
Alignment to HMfl consensus: 
fluery *LPsGWEeHUDpsGRpUYYUNHETkTT<2UJEpP* 

LP+GUIE + + J>+ R YY++H + T + TT + U++P 
dkfzphtes3 3flt LPPGUERRVDDRRRV-YYVDHNTRTTTUG2RP MIS 

45 

duery M^O 1 31 dkf zphtes3_lldBl . 2 similarity to 

Nedd-M-like ubiqui tin-protein ligase (Homo sapiens) 

Alignment to HMM consensus: 
HMM ' *LPsGUEeHb!DpsGRpli)YYWNHETkTT(2l)EpP* 
50 LP+GUE++ D + R Y ++NH+TKTT<3UE + P 

Query Mtl LPPGWEKRVDSTDRV-YFVNHNTKTTtfWEDP M^D 

3B.fc,2 SD1 530 1 31 dkf zphtes3_lldBl . 2 similarity to 

Nedd-M-like ubiqui tin-protein ligase (Homo sapiens) 
55 Alignment to HMM consensus: 

fluery *LPsGli)EeHUIDpsGRpUYYU)NHETkTT(2blEpP* 

LP GblE + + + +G + Y +++H + T + TT+ + + P 
dkfzphtes3 501 LPEGWEIRYTREGVR-YFVDHNTRTTTFKDP 530 
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PAGE INTENTIONALLY LEFT BLANK 
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WO 01/98454 PCT/IB01/02050 
DKFZphtes3_llel7 



S group: testis derived 

DKFZphtes3_llel7 encodes a novel 573 amino acid protein without 
similarity to known proteins- 

10 No informative BLAST results^ No predictive prositei pfam or SCOP 
motif e • 



15 



20 



25 



The new protein can find application in studying the expression 
profile of testis-specif ic genes- 

unknown protein 
Sequenced by fliagen 
Locus: unknown 
Insert length: SIDE bp 

Poly A stretch at pos. EOflO-i polyadeny lat ion signal at pos- 



1 G6CCTGGGGG 6CTTCCCTGG GGGGCTTGTC GCCGGGGCCG CCTGGGCTTT 

SI CAGGTCTTCC GAGGCTGACA TTCACGTTTC ATTCTGCCAC ACTCGGGAAC 

101 GGTGATCGGG GAAGCATGGG GATCCGGGAG AAGCACCCAC AAAACTAGCA 

30 151 TCCTCCT6GA GGAGCTCGGG AATAGGATGA GTGATAATCC ACCCAGAATG 

201 GAAGTGTGTC CTTACTGTAA GAAGCCATTT AAACGATTAA AATCCCACTT 

251 GCCATACTGT AAGATGATAG GATCAACCAT ACCTACTGAT CAAAAAGTTT 

301 ATCAGTCCAA GCCAGCTACA CTCCCACGTG CTAAAAAGAT GAAAGGACCA 

351 ATCAAAGATT TAATTAAAGC TAAAGGGAAA GAGTTAGAGA CAGAGAATGA 

35 401 AGAAAGAAAT TCTAAGTTGG TGGTGGACAA ACCAGAACAG ACAGTGAAGA 

451 CCTTTCCACT GCCAGCTGTT GGTTTGGAAA GAGCAGCTAC TACAAAGGCA 

501 GATAAAGACA TCAAGAATCC AATCCAACCA TCCTTCAAAA TGTTAAAAAA 

551 TACTAAACCA ATGACTACTT TCCAAGAAGA AACCAAGGCT CAGTTTTACG 

bOl CATCAGAGAA AACCTCTCCT AAAAGAGAAC TTGCCAAAGA TTTGCCTAAA 

40 L51 TCAGGAGAAA GTCGATGTAA TCCTTCAGAA 6CTG6AGCGT CTTTACTGGT 

701 TGGCTCAATA GAACCTTCTT TGTCAAATCA AGATAGAAAA TATTCCTCAA 

751 CTCTACCTAA TGATGTACAA ACTACCTCTG GTGATCTCAA ATTG6ACAAA 

B01 ATTGATCCCC AAAGACAGGA ACTTCTAGTA AAATTACTAG ATGTGCCTAC 

651 TGGTGATTGT CATATTTCTC CAAAGAATGT CAGTGATGGG GTTAAAAGGG 

45 "JOl TAAGAACATT ATTAAGCAAT GAGAGAGATT CCAAAGGCAG GGATCACCTC 

151 TCAG6AGTCC CTACTGATGT TACAGTTACT GAGACTCCAG AAAAGAACAC 

1001 AGAATCCCTC ATTTTAAGCC TTAAAATGAG CTCATTAGGT AAAATCCAAG 

1051 TCATGGAGAA ACAAGAGAAA GGACTTACCC TGGGAGTAGA GACGTGTGGG 

1101 AGCAAAGGAA ATGCAGAGAA AAGTATGTCT GCAACAGAAA AGCAGGAACG 

50 1151 GACTGTCATG AGCCAT6GCT 6TGAGAACTT CAACACCA6G GATTCAGTCA 

1501 CAGGAAAGGA GTCTCAAGGG GAAAGACCAC ATTTAAGTTT GTTCATTCCG 

1551 AGGGAGACGA CTTACCAGTT TCATTCTGTA TCGCAGTCAA GTAGTCAAAG 

1301 TCTTGCCTCT CTAGCTACAA CATTTCTTCA AGAAAAGAAA GCAGAAGCCC 

1351 AGAATCATAA TTGTGTCCCT GATGTAAAGG CATTAATGGA GAGTCCCGAG 

55 1401 GGACAGTTAT CTCTGGAGCC CAAATCTGAT AGTCAGTTCC AAGCATCACA 

1451 CACTGGGTGC CAGA6CCCTT TATGTTCAGC CCAGCGTCAC ACTCCTCAGA 

1501 GCCCCTTCAC CAATCATGCT GCAGCTGCTG GCAGGAAGAC TCTTCGCAGC 

1551 TGCATGGGGC TGGAGTGGTT TCCAGAGCTC TATCCTGGTT ACCTTGGACT 
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IbOl AGGGGTGTTG CCAGGGAAGC CTCAGTGTTG GAATGCA ATG ACCCAGA AGC 

IbSl CACAACTTAT CAGTCCCCAG GG6GAAAGAC TCTCACAAGG CTGGATCAGG 

17D1 TGCAACACCA CCATAAGGAA GAGTGGATTC GGTGGCATCA CTATGCTCTT 

1751 CACAGGATAC TTCGTCCTGT GTTGTAGCTG GAGTTTCAGA CGTCTGAAAA 

5 1BD1 AATTGTGCCG ACCCCTGCCC TGGAAGAGCA CAGTACCTCC ATGCATTGGT 

1651 GTGGCGAAGA CGACTGGGGA TTGCCGCTCT A AA AC ATGTT TGGATTAGGA 

1101 AGCACGTTTA AGTAGGAGAA GCCTTCGTGA CTTCTCTCTA GTGCCTTCGT 

1^51 GCCCTGTGTT GCCCACTGAA TTGCCCTGTA ACACCTAAGT GTAGTGGTAG 

2001 CATTAAGGGA TAGCTTTTCA GCCCTCAAGG TTATC AGGAG CATTTGTATC 

10 2051 ACTGCTATAA ATAAA6TAGT ATCACTTGTC ATAA AAAAAA AAAAAAAAAA 
2101 AA 



15 



BLAST Results 
No BLAST result 
20 Medline entries 

No Medline entry 



25 



50 



Peptide information for frame 3 



30 ORF from 177 bp to lflTS bpi peptide length: 573 
Category: putative protein 
Classification: no clue 

1 MSDNPPRIIEV CPYCKKPFKR LKSHLPYCKM IGSTIPTDflK VYfiSKPATLP 

35 £1 RAKKMKGPIK DLIKAKGKEL ETENEERNSK LVVDKPEflTV KTFPLPAVGL- 

101 ERAATTKADK DIKNPIflPSF KNLKNTKPnT TF<2EETKA(2F YASEKTSPKR 

151 ELAKDLPKSG ESRCNPSEAG ASLLVGSIEP SLSNfiDRKYS STLPNDVUTT 

201 SGDLKLDKID P(2R(2ELLVKL LDVPTGDCHI SPKNVSDGVK RVRTLLSNER 

251 DSKGRDHLSG VPTDVTVTET PEKNTESLIL SLKMSSLGKI <2VI1EK<JEKGL 

40 301 TLGVETCGSK GNAEKSMSAT EKflERTVHSH GCENFNTRDS VTGKESflGER 

351 PHLSLFIPRE TTYflFHSVStI SSSflSLASLA TTFL<2EKKAE A(2NHNCVPDV 

»401 KALMESPEGfl LSLEPKSDSfi FflASHTGCOS PLCSAflRHTP flSPFTNHAAA 

M51 AGRKTLRSCM GLEWFPELYP GYLGLGVLPG KPfiCUNAMTfl KPdLISPtfGE 

501 RLSUGUIRCN TTIRKSGFGG ITPILFTGYFV LCCSUSFRRL KKLCRPLPUK 
45 551 STVPPCIGVA KTTGDCRSKT CLD 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_llel7 -, frame 3 
55 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_llel7 ■> frame 3 
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Report for DKFZphtes3_llel7 - 3 

5 lELENGTHD 573 

CM bl 3 b33fi c 3.flfi 
EpI3 

EBLOCKSl BLDDDBfl Zinc finger-, C2H2 type-, domain proteins 
CKU3 Alpha_Beta 
10 EKIO LOW_COriPLEXITY 7- 50 */. 

SE<2 MSDNPPRnEVCPYCKKPFKRLKSHLPYCKniGSTIPT])(3KVY(3SKPATLPRAKKf1K6PIK 
SEG 

15 PRD ccccccceeecccccchhhhhhhcccceeeeccccccceeeeeccccchhhhhhhcccch 

SE<3 DLIKAK6KELETENEERNSKLVVDKPEt3TVKTFPLPAVGLERAATTKADKDIKNPI(3PSF 
SEC 

PRD hhhhhhcccchhhhhhhhheeeeccccceeecccccchhhhhhhhhhhcccccccccchh 

20 

SE<2 KnLKNTK:PMTTF(3EETKA(2FYASEKTSPJCRELAICDLPKSGESRCNPSEA6ASLLVGSIEP 

SEG 

PRD hhhhcccccchhhhhhhhhhhhhcccccchhhhhccccccccccccccchhhhhhhcccc 

25 SE(3 SLSNfiDRKYSSTLPNDV(JTTSGDLKLDKIDP<3R(3ELLV<LLDVPTGDCHISPKNVSDGVIC 
SEG 

PRD ccccccceeecccccccccccccccccccccchhhhhhhhhccccccccccccccccchh 

SE(2 RVRTLLSNERDSICGRDHLSGVPTDVTVTETPEKNTESLILSLICnSSLGICI(3VMEIC(2EKGL 
30 SEG xxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhccc 

SE(2 TLGVETCGSKGNAEKSnSATEKfiERTVnSHGCENFNTRDSVTGKEStfGERPHLSLFIPRE 
SEG 

35 PRD eeeeecccccccchhhhhhhhhhhhhhhcccccccccccccccccccccccceeeeeccc 

SEfl TTY(2FHSVS(aSSS«2SLASLATTFL(3EKKAEA(2NHNCVPDVKALnESPEG(3LSLEPKSDS(3 
SEG • xx xxxxxxxxxxxx 

PRD eeeeeeccccccchhhhhhhhhhhhhhhhhhhccccccchhhhhcccccccccccccccc 

40 

SEG F(3ASHTGCflSPLCSAl3RHTPc3SPFTNHAAAAGRKTLRSCHGLEWFPELYPGYLGLGVLPG 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccchhhhhcchhhhhhccccccccccccccceeeccc 

45 SEC KP<2CUNAriTr2KPt3LISPr3GERLS(3GlilIRCNTTIRKSGFGGITMLFTGYFVLCCSliJSFRRL 
SEG xx 

PRD ccccccccccccccccccccchhhhhccccceeeecccccceeeecceeeeeecchhhhh 

SE(2 KKLCRPLPUKSTVPPCIGVAKTTGDCRSKTCLD 

50 SEG 

PRD hhhccccccccccccceeeeecccccccccccc 



55 



(No Prosite data available for DKFZphtes3_llel7.3) 
(No Pfam data available for DKFZphtes3„llel7 - 3) 
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DKFZphtes3_12dlfi 



PCT/1B01/02050 



5 group: testis derived 

DKFZphtes3_15dlfl encodes a novel 1170 amino acid protein without 
similarity to known proteins- 

10 The EST-distribution signifies an ubiquitous expression pattern- 
No informative BLAST results^ No predictive prositen pfam or SCOP 
mot i f e • 

The new protein can find application in studying the expression 

15 profile of testis-specif ic genes- 
unknown protein 

20 perhaps complete cds- 
Sequenced by tfiagen 

Locus: /map =T? 13ti-T cR from top of Chrl3 linkage group" 



25 



Insert length: 54bT bp 

Poly A stretch at pos- 544*5-. polyadenylat ion signal at pos- 5420 



30 1 A AGGACAGAG GACGAGATTT TGAACGACAA AGAGAAAAGA GAGACAAGCC 

51 AAGGTCTACT TCCCCAGCAG GACAGCATCA TTCTCCTATA TCTTCTAGAC 
101 ATCACTCATC TTCCTCACA A TCAGGATCAT CTATTCAAAG ACATTCTCCT 
151 TCTCCTCGTC GAAAAAGAAC TCCTTCACCA TCTTATCAGC GGACACTAAC 
201 TCCACCTTTA CGACGCTCTG CCTCTCCTTA TCCTTCACAT TCTTTGTCGT 
35 251 CTCCCCAGAG AAAGCAGAGT CCTCCAAGAC ATCGCTCTCC AATGCGAGAG 

301 AAAGGGAGAC ATGATCATGA ACGAACTTCA CAGTCTCATG ATCGACGCCA 
351 CGAAAGGAGG GAAGATACTA GGGGCAAACG AGACAGAGAA AAGGACTCAA 
401 GAGAAGAACG AGAATATGAA CAGGATCAGA GCTCTTCTAG AGACCACAGA 
451 GATGACAGAG AACCTCGAGA TGGTCGGGAT CGGAGAGATG CCAGAGATAC 
40 501 TAGGGACCGA AGGGAACTAA GAGACTCCAG AGACATGCGG GACTCAAGGG 

551 AGATGAGAGA TTATAGCAGA GATACCAAAG AGAGCCGTGA TCCCAGAGAT 
tOl TCTCGGTCCA CTCGTGATGC CCATGACTAC AGGGACCGTG AAGGTCGAGA 
L51 TACTCATCGA AAGGAGGATA CATATCCAGA AGAATCCCGG AGTTATGGCC 
701 GAAACCATTT GAGAGAAGAA AGTTCTCGTA CGGAA ATAAG GAATGAGTCC 
45 751 AGAAATGAGT CTCGAAGTGA AATTAGAAAT GACCGAATGG GCCGAAGTAG 

B01 GGGGAGGGTT CCTGAGTTAC CTGAAA AGGG AAGTCG AGGC TCAAGAGGTT 
651 CTCAAATTGA TAGTCACAGT AGT AATAGCA ACTATC ATGA CAGCTGGGAA 
^01 ACTCGAAGTA GCTATCCTGA AAGAGA1AGA TATCCTGAAA GAGACAACAG 
=151 AGATCAAGCA AGGGATTCTT CCTTTGAGAG AAGACATGGA GAGCGAGACC 
50 1001 GTCGTGACAA CAGAGAGAGA GATCAAAGAC CAAGCTCACC AATTCGACAT 
1051 CAGGGAAGGA A1GACGAGC1 1GAGQG1GA1 GAAAGAAGAG AGGAACGAAG 
1101 AGTAGACAGA GTGGATGATA GGAGAGATGA AAGGGCTAGA GAGAGAGATC 
1151 GGGAACGAGA ACGAGACAGG GAGCGGGAGA GAGAGAGGGA ACGTGAACGG 
1201 GATCGGGAAA GAGAAAAAGA GAGAGAACTA GAAAGAGAGC GTGCTAGGGA 
55 1251 ACGGGAGAGA GAAAGAGAAA AAGAGAGAGA TCGTGAAAGG GATAGAGACC 
1301 GAGACCACGA TCGAGAGCGG GAAAGAGAGA GGGAACGAGA CAGGGAAAAA 
1351 GAACGGGAAC GAGAAAGAGA AGAGAGAGAG AGGGAGAGAG AGCGAGAACG 
1401 GGAGAGAGAG CGAGAGCGAG AACGGGAACG AGAAAGAGCG AGAGAAAGGG 
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mSl ATAAAGAACG AGAACGCCA A 
1501 GATGACCGCA GAGA A AAGCG 
1551 AGATGGACAT GATGAAAGAA 
IbOl GTCCCAGCCC TAGACAGTCC 
5 IbSl AGTGATGCCT ACAACAGTGG 
1701 GAGCCAAGTT GTACGACCTC 
1751 TCACAGAAGA CAGACAGGGT 
1601 AGGAAAGAGA GTTCAAGGCG 
1651 TTCTTCTGTA GATAAACAGA 

10 1101 GAATGCGTGC ACAGGACATT 
nSl ACATCTGATC GAGCTCATGA 
20D1 GAAACCAATT AAGAAAAAGA 
2D51 ACATAGAGAC AACATCTGAA 
21D1 CAGAAAAAGA AAAGCATTGA 

15 S151 TGATATTTCT GATGAAGAAG 
2201 CACGGACTCC CCCTATAACA 
2251 GGTAAGAATG GTATTCTAGA 
2301 CAGTGACTGG TCTGATGAGG 
2351 CAGAGCATAC TGCCACCGCC 

20 2401 TCTTCTCTTC TTCCTCCTCC 
2451 TGTGCCTGCA ACTCTTGCTG 
25D1 GCACATCTGC CATCACTATT 
2551 AATAATACTT TTGCCAATGA 
2b01 AGTAGAAAAA GTAGAGACGC 

25 2b51 GCAAGCCTAT GGATCAAAAG 
2701 AACCGTAGTC AT ACGTCTGG 
2751 CCATCGAAGT GGAGATGACC 
2601 GCTCAAGAGA TAGAGAAAAA 
2651 AAATCTAGGA TTGATCAGTT 

30 2*501 TTCAGATCGC CAGGATTCAA 
2^51 AGTCAGATCG ACAGGTCCAT 
3001 AGGCTTCAAG AACG AGATCG 
3D51 GAGGAGAGAT ACGAGGCAGA 
3101 GGCCACGCAA CAGGGATCGA 

35 - 3151-GAACGAGACA AAAGGAGAGA 
3201 TGATTCTGTT GAAAGGGACA 
3251 GTTCTCAAAT AGAGTCTGTG 
33D1 CATGAAAGGG ATCTAGAAAG 
3351 AGAGAGAATG GATAAAGATC 

40 3401 ATAAATCCGA GAGAACTGAG 
3451 TTAGATGATG CACATTCATT 
3501 AATCAGTGAT GACGAACTAG 
3551 GGGAGGACCA ACAGGATGAG 
3b01 GATGTGGATT GGTCTGGTCT 

45 3b51 GCCTGGGGCT GCACTCTTAA 
3701 TTGGGATTTC TAAAAAGTTG 
3751 GAAACATGTC AGAGACTTTT 
3fi01 TTAACTATAT AATGTCTGTT 
3651 CTGATCTGTT CTTATGTAGC 

50 3^01 GAAATCATTT TATAATCATT 
3=151 AAAATTTATT TTGAGAAGAA 
4001 GTACAATGAC CTTTTTGTAT 
4051 TGTTGCATGT CTGTAGTCAC 
41D1 TGCTGGAGCC CAGGAAATTG 

55 4151 CACTCCAACC TAGGTGACAG 
4201 AAATTAACCA ATAAGTTCTA 
4251 CTAAATGAAG CAGAGCCAGG 
4301 AGAAATTGGG TATTTTGGCA 



AGGGATTGGG AAGACAA AGA CAAAGGACGA 
AGA AGAGATC CGAGAAGATA GGAATCCAAG 
AATCAAAGAA GCGCTATAGA AATGAAGGGA 
CCGAAGCGCC GGCGTGAACA TTCTCCGGAC 
AGATGATAAA AATGAAAAAC ACAGACTCTT 
AAGAATCTCG TTCTCTTAGT CCCTCGCACC 
AGATGGAAAG AGGAGGATCG TAAACCAGAA 
CTACGAAGAA CAGGAACTCA AGGAGAAAGT 
GAGAACAGAC AGAAATCCTG GAAAGCTCAA 
ATAGGACACC ACCAGTCTGA AGATCGAGAG 
TGAAAACAAG AAGAAAGCAA AAATTCAAAA 
AAGAGGATGA TGTTGGAATA GAGAGGGGTA 
GATGGTCAAG TATTTTCACC AAAAAAAGGA 
AAAAAAACGT AAAAAATCCA AAGGTGATTC 
CAGCCCAGCA AAGTAAGAAG AAAAGAGGCC 
ACTAAAGAGG AATTGGTTGA AATGTGCAAT 
GGACTCCCAG AAAAAAGAAG ATACAGCATT 
ATGTCCCTGA CCGTACAGAG GTGACAGAAG 
ACGACTCCTG GTAGTACCCC TTCTCCTCTA 
ACCGCCTGTG GCTACTGCCA CTGCTACAAC 
CCACTACTGC TGCTGCCGCC ACCTCTTTCA 
TCCACCTCTG CCACCCCCAC CAATACCACC 
AGACTCACAC AGAAAATGCC ACAGAACACG 
CTCACGTGAC TATAGAAGAT GCACAGCATC 
AGGAGCAGCA GCCTCGGGAG CAATCGGAGT 
TCGTCTTCGC TCCCCATCCA ATGATTCAGC 
AAAGTGGTCG AAAGAGAGTA CTGCACAGTG 
ACAAAAAGCC TGGAAATCAC AGGAGAGAGA 
AAAGCGTGGA GAACCCAGTC GAAGTACTTC 
GAAGCCATAG TTCAAGAAGA AGTTCTCCAG 
TCAAGATCTG GGTCATTTGA TAGCAGAGAC 
ATATGAACAC GACAGAGAGC GCGAGAGAGA 
GAGAATGGGA CCGAGATGCT GATAAAGATT 
GATAGATTGC GAGAACGAGA ACGAGAGAGA 
CTTGGATAGG GA A AG A GAGA- GACTAATTTC 
GGGACAGAGA CAGAGACAGA ACTTTTGAGA 
AAACGCTGTG AAGGAAAACT GGAAGGTGAA 
CACTTCCCGA GACTCTCTAG CCTTGGATA A 
TGGGATCTGT GCAGGGATTT GAAGATACAA 
AGTCTGGAAG CAGGAGATGA CGAGTCCAAG 
AGGCTCTGGT GCTGGAGAAG GATACGAGCC 
ATGAAATTCT GGCAGGTGAT GCAGAAA AGA 
GAGAAGATGC CAGATCCCTT AGATGTGATA 
TATGCCAAAG CATCCAAAAG AACCACGAGA 
AATTCACACC TGGAGCTGTT ATGCTAAGAG 
GCAGGTTCTG AACTCTTTGC CAAAGTCAAA 
AGAAAAACCC AAAGGT AGTT TCATTTTACT 
AACCATTTAA GATGCCATCT GAAGGGGATT 
ACTTAACACT GTGTAGAAAC TATTTTTTGA 
ATTTAACCCT CATGGTCAAA GTTTCTCTTT 
GAGTTATCCC ACAGAAAAGT TGGGAAA AGA 
GAAAATTACT TATTAACAGG CCAGGCGTGG 
AGCTACTCAG GGAGGTTGAG GCAGCAGGAT 
AGGCTGCAGT GAGCCATGAT TGAGCCACCA 
AGCAAGACCC TGTCTCAAAA AAAAAAAAAC 
ATATCAAAGT GCTCAGTGGT TTGCCCTTGG 
A A AAACAGAC TACATATTTT TCATGTCTAA 
GCCCTTTCCC CTAGACATCT ACCCAAATGC 
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4351 AGGTGTGTAG 

4401 GGCCGGA1GA 

4451 AATCTGTGTG 

4501 AACTTCCTAT 

5 i*551 AGAACAGTTC 

4b01 TG AAAATATC 

4bSl TTATTCCAGA 

4701 TTTGGTTTTC 

4751 GCAACTTACT 

10 MfiDl GAAGGAAAAG 

4fl51 ACGTCATGTC 

4^01 ATACAAAATA 

4^51 CCTTTACTCT 

5001 AATAAACCTA 

15 5051 TTAACCGAAA 

5101 TGTTTGGTTG 

5151 TTTATTTGTA 

5201 AAGATATTTG 

5251 TGGTTTGATT 

20 5301 GACTTAACCT 

5351 GCTTAACAAT 

• • 5401 AGAACTGATG 

5451 AAAAAAAAAA 



GTTGAGTCTT 
TCTGGATTTC 
TTAGTGTCAT 
GGTAATTAGG 
CTGGCATATA 
CTATCACGCT 
TTATCTAGCT 
ACTGTCTCTG 
AATACTTTAT 
GAGG1 AAAAG 
CCTGGACATA 
AAAA AATATT 
ATTAGCATAA 
GAGTATACAT 
AATGACCCAA 
GTTCTGTGAC 
AAGTGATGGA 
TTTCTGTGAC 
TTGCTACCCA 
AAACAGTGAA 
CAGACTTCTT 
TATTATCTTG 
AAAAAAAAA 



T A ACAAAGTG 
AGTAGGCACA 
CATCTGTAAG 
AGCAAATGAG 
GTAATACCCA 
GATTTTTGAC 
TATGGATTCT 
TCTCATCTAG 
TAATGGGGAG 
GTGAAAGGAA 
A AAGTTTAGT 
GTTTTATGAG 
GCAGTAAATT 
GTACAAAATA 
GACTTAGTTC 
CTTAAGCAAA 
ATAAAACCCC 
CTTTTGCTAG 
TGAAATACAG 
AATATGCACT 
CTATTTTTGC 
AATAAAGACT 



ATTAAGAGCT 
CCACTTACTG 
TCAGGAATAA 
TTATTACAGG 
ATAAATATTA 
CTCACTGCAG 
GGTGGTAGGG 
TACCTACCTT 
GGACGAGTAG 
CAACATTAAT 
TAGTATTAAA 
TTTTATGAAT 
TTTTTATTTT 
CATATAATTG 
TTGCCCTACT 
TAACTCCTGT 
TAAAATCTTA 
TAGCATTTCA 
TTCGGCCCTT 
GTAAAGGGTG 
TGCTATGGTG 
TTGTCTTGTT 
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TGGTCTGTAA 
GCTATTACTT 
TCATACCACC 
CAAAACACTT 
ACTGCTACTT 
CAATTTTCAG 
GTTGTTTGGT 
AGTTTATTTT 
ATGGTAAAAA 
TAACAATTTT 
TTTTTCACTA 
TCATGCCCTT 
AATATAGCCC 
TTAACGTGTA 
GTATCTGCCT 
GAGCCTCAAT 
CCCACCTCTA 
AGTTAAAATC 
ACTTATTGAT 
GGGTGATGTG 
GTTGTATTAG 
TACTGCCCTA 
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BLAST Results 



No BLAST result 



Medline entries 



35 Nd Medline entry 
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Peptide information for frame 1 



45 



50 



55 



ORF from 2T2 bp 
Category: simila 
Classification : 

1 MREKGRHDHE 

51 DHRDDREPRD 

101 PRDSRSTRDA 

151 NESRNESRSE 

201 ShlETRSSYPE 

251 IRHdGRNDEL 

301 RERDREREKE 

351 rekererere: 

401 KGRDDRREKR 
451 SPDSDAYNSG 
501 KPERKESSRR 
551 DRETSDRAHD 
bOl KKGtfKKKSTE 



to 3fl01 bpn peptide length: 
rity to unknown protein 
no clue 



1170 



RTSC2SHDRRH 
GRDRRDARDT 
HDYRDREGRD 
IRNDRMGRSR 
RDRYPERDNR 
ERDERREERR 
RELERERARE 
ERERERERER 
EEIREDRNPR 
DDKNEKHRLL 
YEE<2ELKEKV 
EfNTKICIcCAKIfilC 
KKRKKSKGDS 



ERREDTRGKR 
RDRRELRDSR 
THRKEDTYPE 
GRVPELPEKG 
DSARDSSFER 
VDRVDDRRDE 
REREREKERD 
ERERERERER 
DGHDERKSKK 
S6VVRPC3ESR 
SSVDKflREflT 
KPIKKKKEDD 
DISDEEAA<2(3 



DREKDSREER 
DMRDSREMRD 
ESRSYGRNHL 
SRGSRGS<2ID 
RHGERDRRDN 
RARERPRERE 
RERDRDRDHD 
ERARERDKER 
RYRNEGSPSP 
SLSPSHLTED 
EILESSRMRA 
VGIERGNIET 
SKKKRGPRTP 



EYEdDtfSSSR 
YSRDTKESRD 
REESSRTEIR 
SHSSNSNYHD 
RERDflRPSSP 
RDRERERERE 
RERERERERD 
ERtfRDWEDKD 
RflSPKRRREH 
ROGRUKEEDR 
(2DIIGHHGSE 
TSEDGtfVFSP 
PITTKEELVE 
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t51 MCNGKNGILE DStfKKEDTAF SDUSDED VPD RTEVTEAEHT ATATTPGSTP 

701 SPLSSLLPPP PPVATATATT VPATLAATT A AAATSFSTSA ITISTSATPT 

751 NTTNNTFANE DSHRKCHRTR VEKVETPHVT IEDAtfHRKPII DflKRSSSLGS 

AD1 NRSNRSHTSG RLRSPSNDSA HRSGDDflSGR KRVLHSGSRD REKTKSLEIT 

5 fl51 GERKSRIDflL KRGEPSRSTS S3>R<2DSRSHS SRRSSPESDR (2VHSRSGSFD 

■=101 SRDRLC2ERDR YEHDRERERE RRDTR<2REbJD RDADKDUPRN RDRDRLRERE 

^51 RERERDKRRD LDRERERLIS DSVERDRDRD RDRTFESSGI ESVKRCEAKL 

1Q01 EGEHERDLES TSRDSLALDK ERI1DKDLGSV (2GFEDTNKSE RTESLEAGDD 

1051 ESKLDDAHSL GSGAGEGYEP ISDDELDEIL AGDAEKREDG (2DEEKNPDPL 

10 1101 DVIDVDUSGL MPKHPKEPRE PGAALLKFTP GAVNLRVGIS KKLAGSELFA 

1151 KVKETCtfRLL EKPKGSFILL 



15 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_12dlA-« frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lBdlfi-i frame 1 
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Report for DKFZphtes3_12dlA. 1 



ELENGTH3 12b7 
30 EI1U3 1505=13^5 
IpI3 

(CHOnOLl TREMBL:AB020bbO_l gene: "KI AA0653" =. product: 

"KIA ADA53 protein"^ Homo sapiens mRNA for KI A ADS53 proteini 

partial cds- □•□ 
35 EBL0CKS3 BL0DM22C Granins proteins 

EBLOCKSJ BLDDfiD3F 

EBL0CKS3 PRDD3D6C 

CBL0CKS3 PRDlOfilB 

EBL0CKS3 PRDDD4TD 
40 EBL0CKS3 PR010&3A 

EBL0CKS3 PR005M5A 

EBL0CKS3 BL000M A Protamine PI proteins 
ICBL0CKS3 PF01140D 
IEBL0CKS3 PRDDA33H 
45 EKhO All_Alpha 

EKU3 L0U_C0I1PLEXITY MM-12 * 

SEQ KDRGRDFERflREKRDKPRSTSPAG(3HHSPISSRHHSSSS(3SGSSIi3RHSPSPRRKRTPSP 

50 SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccchhhhhhhhccccccccccccccccccccccccccccccceeeccccccccccccc 

SEt3 SYi2RTLTPPLRRSASPYPSHSLSSPt2RK<2SPPRHRSPMREKGRHDHERTSt2SHI>RRHERR 

SEG x xxxxxxxxxxxxx xxxxxxxx 

55 PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhhc 

SE(3 EDTRGKRDREKDSREEREYE(3D(3SSSR]>HRI>DREPRI>GRI>RRl>ARI>TRI>RRELR]>SRDf1R 
SEG xx.xxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
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PRD cccccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhhhhhhhhcccc 

SE(2 DSREMRDYSRDTKESRDPRDSRSTRDAHDYRDREGRDTHRKEDTYPEESRSYGRNHLREE 

SEC xxxxxxxxxxx- • • xxxxxxxxxx xx 

PRD hhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SE<2 SSRTEIRNESRNESRSEIRNDRHGRSRGRVPELPEKGSRGSRGS(2IDSHSSNSNYHDSWE 

xxxxxxxxxxxx - . 

PRD hhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SE<3 TRSSYPERDRYPERDNRD(3ARDSSFERRHGERDRRDNRERDC2RPSSPIRHl2GRNDELERD 

SEG - - - xxxxxxxxxxxxxxxxxx XXX XXX 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

SE<3 ERREERRVDRVDDRRDERARERDRERERDRERERERERERDREREKERELERERARERER 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD hhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SE<3 EREKERDRERDRDRDHDRERERERERDREKEREREREERERERERERERERERERERERA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SE(2 RERDKERERi2RDUEDKDKGRDDRREKREEIREDRNPRDGHDERKSKKRYRNEGSPSPR(2S 

SEG xxxxxxxxx- xxxxxxxxxxx xxx xxxxx xxxxx xxxxxxx 

PRD hhhhhhhhhhhhhhhccccccchhhhhhhhhhccccccccccchhhhhhccccccccccc 

SE<2 PKRRREHSPDSDAYNSGDDKNEKHRLLS(3VVRP(3ESRSLSPSHLTEDR(2GRUKEEDRKPE 

SEG xxxxx . 

PRD ccccccccccccccccccccchhhhhhhhhcccccccccccccccchhhhhhhhhhccch 

SE<2 RKESSRRYEE(3ELICEKVSSVDK(3RE(3TEILESSRnRA(3DIIGHH(3SEDRETSDRAHDENK 

SEG x 

PRD hhhhhhhhhhhhhhhhhhccchhhhhhhhhh^ 

SEO KKAKI(3KKPIKKKKEDDVGIERGNIETTSEDG(3VFSPKKG(3KKKSIEKKRKKSKGDSDIS 

SEG xxx xxxxxxxxxxx xxxxx xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhccccccccccccccccceeecccceeecccccchhhhhhhhhhccccccccc 

SE<2 DEEAA(3(3SKKKRGPRTPPITTKEELVEMCNGKNGILEDSr2KKEDTAFSDlJSDEDVPDRTE 

SEG xxx xx 

PRD hhhhhhhhhhcccccccccccchhhhhhcccccceeecccccccccccccccccccceee 

SE<2 VTEAEHTATATTPGSTPSPLSSLLPPPPPVATATATTVPATLAATT AAAATSFSTSAITI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhccccccccccceeQccccccceeeeeGGCccchhhhhhhhhhhhccccceee 

SEfi STSATPTNTTNNTFANEDSHRKCHRTRVEKVETPHVTIEDA(3HRKPMD(3KRSSSLGSNRS 

SEG xxxxx xxxxxxxxxxx xxxxxxxxxx 

PRD eccccccccccccccccccccchhhhheeeeccceeeecccccccccccccccccccccc 

SE(3 NRSHTSGRLRSPSNDSAHRSGDD(3SGRKRVLHSGSRDREKTKSLEITGERKSRID(3LKRG 

SEG xxx 

PRD ccccccccccccccccccccccccccceeeeeccccccccceeeeehhhhhhhhhhhhcc 

SE<3 EPSRSTSSDRgDSRSHSSRRSSPESDRdVHSRSGSFDSRDRLdERDRYEHDRERERERRD 

SEG . -xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccGeGecccccccchhhhhhhhhhchhhhhhhhhh 
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SE<3 TRflREWDRDADKDWPRNRDRDRLRERERERERDKRRDLDRERERLISDSVERDRDRDRDR 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxx 

PRD hhhhhhhhccccccccccchhbhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccce 

SEfl TFESSfllESVKRCEAKLEGEHERDLESTSRDSLALDKERMDKDLGSVflGFEDTNKSERTE 

SEG 

PRD eechhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhcccccccccccccccccc 

SE<3 SLEAGDDESKLDDAHSLGSGAGEGYEPISDDELDEILAGDAEKREDCGDEEKIIPDPLDVI 

SEG 

PRD cccccccccccccccccccccccccccccccccceeeecchhhhhhhhhhhcccccceeB 

SEfl DVDUSGLriPKHPKEPREPGAALLKPTPGAVf1LRVGISKKLAGSELFAKVKETC(3RLLEKP 

SEG 

PRD eccccccccccccccccccceeeeeccceeGeeecccccccchhhhhhhhhhhhhhhhcc 

SEfl KGSFILL 

SEG 

PRD. ccccccc 



(No Prosite data available for DKFZphtes3_12dia -l) 
(No Pfam data available for DKFZphtes3_12dia - 1) 
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5 group: testis derived 

DKFZphtes3_mi7 encodes a novel fll5 amino acid protein without 
similarity to known proteins- 

10 The mRNA is transcribed ubiquitously - 

No informative BLAST results 1 . No predictive prosite-i pfam or SCOP 
motife. 



The new protein can find application in studying the expression 
15 profile of testis-specif ic genes- 



similarity to C-elegans BOmS - 3 

20 see also DKFZphtes3_17n3 
perhaps complete cds- 



Sequenced by BMFZ 
25 Locus : unknown 



Insert length: 3522 bp 

Poly A stretch at pos. 345k-. polyadenylation signal at pos. 



3437 



30 



35 



40 



45 



50 



55 



1 AACACATCGA 
51 GAGCATGATG 
101 TGATGAAGCT 
151 GTTTCTATTA 
201 GCAAAAACCC 
251 TCCCTTAGAA 
301 GGCGACAAGG 
351 GTGGGA ATTC 
401 GATAATCATA 
451 AGTGCCCGCG 
501 TATTACGCAA 
551 GTGTGATTAT 
bOl CTACAGCTCT 
t51 ACTTACTCCC 
701 GAAGTCTCGG 
751 CTGATCCAGA 
fiOl CTGTGGGCAG 
A51 AGTACAGGAC 
101 CAAGTTTTCA 
■=151 GATTGTCCAC 
1001 TCAGGAATAC 
1051 AAGTTTTAGA 
1101 ACAAGAAAAC 
1151 GAAAATTGAG 
1201 GATGTGTGGT 
1251 CAAGAAGCCT 
1301 TGACAATGAA 
1351 TCATATCCAG 



CTTGTGTAAG 
CATGGATGTC 
ATTAAGTTAG 
CCAGCAGGCA 
TCTGTAACCA 
ACACAAACAG 
AATACTAAGT 
TTGCCATTCA 
ACTCTTCTGA 
AATGAAAAGT 
AGGATTATAC 
CGGAGTGAAG 
GAAGTGCTCC 
TAGAACTCCT 
ATAGAAAAGA 
ACCCGACTGT 
ACCGAATTTC 
TTTGTGCCAT 
TGTTGATGTT 
ATCCCATTAG 
AACCAGTTCT 
AAATCTGACT 
TGTTATTTAA 
ATTACTTCAG 
TTTAAATTGG 
TACAGGCAGC 
GTTCACTGGG 
AGTCCCAAAC 



AAAAAGATTG 
TAAACAATTC 
GGTTAACAGC 
GCATACTATG 
CGAAGCTTCT 
GCGTTCTTG A 
TTTGATCTTT 
GCTGAAGGAG 
GCAATGCTGT 
CACCTAATGG 
CAAAGCTTTG 
GATGGTGGAC 
TACCTCATGG 
TGGTAGAGCT 
ACCTCATAAA 
GATATCTTAG 
TCTGGCTGGC 
TTGTGCAGTG 
CCTGTTCAGT 
GTTTTCCAAG 
GTGTAATAGA 
CAAGGAAAGA 
GTTTGTTGCA 
TGGATCTTGC 
CAGGGAGGAG 
TCGGTCTTTC 
ACAGCATTAT 
ATTTCTGTAC 



GAAGTGCGGA 
CAGGCCTTTG 
TATTCAAACT 
CCCAGGAGCG 
GTAATGTATC 
CTTTTATGGA 
CTGATCCTGA 
AGAAATGTTG 
TGCACAGTTC 
TTCAGATGGG 
AAGTTGCTGG 
TCTGCTCACT 
CCCAATTAAA 
TCAACTCTGA 
TGTTTTAATG 
CTGTGAAAAC 
AGCAATATTT 
CAAAGCCAAG 
TTGATATTTA 
CTCTGTGTCA 
AGAAGCATCC 
TGTGCCTAGT 
AAAACTGAAG 
TCTGGGCAAT 
GAGGAGATGC 
AAAAGGCGAC 
AATTCAGGCA 
ATCTGCTACA 



GGTGTCTTTT 
GAGATTTATT 
CAGAATCCTG 
GAAACAGCTT 
CCAATCCTGA 
CAAAGATCAT 
AAAAGAAAAG 
TTCACTCTGA 
AAGAAGTATA 
AGAGGAATAT 
ATTATGTGAT 
TCTGTATTAA 
GGATTACATT 
AAGATGACCA 
AATGAAAGTC 
TGCTCAGAAG 
TCACAATAGG 
TTTCATGCCC 
TCTGAAGGCT 
GCTTTAATAA 
AAAGCAAATG 
TCCTGGCAAA 
ATGTGGGAAA 
GAGACGGGAA 
TGCTTCCTCC 
CTAAGCTACC 
AGCACAATGA 
TGAACCCCCT 
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mOl GCACTGACT A ATGAAATGTA TTGTTTGGTT GTGACTGTTC AGTCCCATGA 

1451 A AAGACCCAA ATCAGAGATG TGAAGCTCAC TGCTGGCTTA AAACCAGGAC 

1501 AGGATGCCAA TTTAACTCAG AAGACTCACG TGACTCTTCA TGGACCAGAA 

1551 CTGTGTGATG AATCCTACCC GGCTTTACTC ACTGACATTC CTGTTGGAGA 

5 IbDl CTTACATCCA GGGGAACAGC TGGA AAAA AT GTTGTATGTT CGCTGTGGAA 

lfaSl CAGTGGGTTC CAGAATGTTT CTTGTATATG TTTCTTACCT GATAAATACA 

17D1 ACCGTTGAAG AAAAAGAAAT TGTTTGCAAG TGTCACAAGG ATGAAACTGT 

1751 AACAATTGAA ACAGTCTTTC CATTTGATGT TGCGGTTAAA TTTGTTTCTA 

IflOl CCAAGTTTGA GCACCTGGAA AGGGTTT ATG CTGACATCCC CTTTCTGTTG 

10 lfiSl ATGACGGACC TCTTAAGTGC CTCACCCTGG GCCCTCACTA TTGTTTCCAG 

ITOl TGAGCTCCAG CTTGCTCCAT CCATGACCAC AGTGGACCAG CTCGAGTCTC 

ITSl A AGTGGACAA TGTTATCTTA CAGACTGGAG AGAGTGCTAG TGAATGCTTT 

5001 TGTCTTCAAT GCCCATCTCT TGGA AATATT GAAGGTGGAG TAGCAACCGG 

5051 GCATTATATT ATCTCTTGGA AAAGGACCTC AGCAATGGAG AATATCCCCA 

15 5101 TCATCACAAC TGTCATCACT CT6CCGCACG TGATTGTGGA GAATATCCCT 

5151 CTCCATGTGA ATGCAGATCT GCCGTCATTT GGGCGTGTCA GAGAGTCGTT 

5501 ACCTGTCAAG TATCACCTAC AGAATAAGAC CGACTTAGTT CAAGATGT AG 

5251 AAATTTCTGT GGAGCCCAGT GATGCCTTCA TGTTCTCAGG TCTCAAACAG 

2301 ATTCGATTAC GTATCCTCCC TGGCACGGAG CAGGAAATGC TATATAATTT 

20 5351 CTATCCTCTG ATGGCTGGAT ACCAGCAGCT GCCATCTCTC AACATCAACT 

5401 TGCTTAGATT TCCTAACTTC ACAAATCAGC TGCTCAGGCG TTTTATACCT 

5451 ACCAGTATTT TTGTCAAGCC ACAGGGTCGA CTCATGGATG ATACCTCTAT 

5501 TGCTGCTGCA TGATGTTCAA GACCGGCCCT TGGCTGTTGT TACAGAGATG 

5551 TTGGGCAGAG CTATGCAGGT GTTTCATTGT GAACTCTAGC TTTGATGATG 

25 5b01 GTAAAAAGTT AACCTTTTCT ATTTTTTAAT GGATGTTATA CCAACTATTC 

5b51 AGAGGAACTC ATACTTCAAA AATATTAGGA AAATCTGTCT TATAGTTTCT 

57D1 CTAATAAATA TCTGAAATCT CAGTACGACA TGAAA6AATG TCAGACCATT 

5751 GTTATTGTTG AAAGTCATTT GATGAATGGT AAATTCTATG AAAAGTAAGT 

SA01 GATTTGCATG TATAATATCA GGAAAATTAA GCATCCCAAG TGTGACTGGA 

30 5651 CAAAGAGAGC AGATGCACCA GTGCCTGTGC CATAAAGTTC CGAATCCCCC 

• SHOl ATGTGTCTCT TTCAGAGCTG GCCAGACCGG AAATAAATCA TTCTCATAAA 

5^51 TTCAGTGTGT ACTCAGAACA CATACACAAC AACATAGGGA GTTGTATGAC 

3001 TGATACGGAA AACTTCCAGA AAGTTTTAAT CAAAGCAGTT TAATTAAGGT 

3051 ATCAAAAATA TCTTTGCTTA CTATCAAGAA GT6TCAAATA GGTTCAGCTT 

35 3101 GCTGCCAAAA TATGGATCAT TTATGAAGCA GGTTCATATT TTAGAGGTGT 

3151 TAATAAAATC CTCATGGGAA AAGATCCAAA GTGCA AGGAT TTGATTATAA 

3501 ACATAATTTC CTAGACT6AA AGTTTTTGGA AAAGATGCAG GGTCTGAGTC 

3551 AGGCCTTCTG GTTATATTGT GCAGTTTCAA A AGAACTATT TAAAACTCTT 

3301 GAAAACTCAT GTAAATAAAA ATCATAGGGT GAAAATTGTA TTTGTTAAAA 

40 3351 TACCTTAATA ATTTAAAATG ACCTGATTTC CTGGAAAATT TTATTATTCA 

3401 AAAGGTGGAG GCATTGTAAA AAGGAAATAG TGATGTAAAT AAACATGTTC 

3451 TCTTTCAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3501 AAAAAAAAAA AAAAAAAAAA AA 



45 



BLAST Results 



No BLAST result 

50 

Medline entries 

55 No Medline entry 
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WO 01/98454 



PCT/IB01/02050 
Peptide information for frame 3 



ORF from Lb bp to 2510 bp=, peptide length: 615 
Category: similarity to unknown protein 
Classification: no clue 



10 



15 



20 



25 



1 NSK(2F(2AFGD 
51 NHEASVMYPN 
101 K3LKERNVVH 
151 YTKALKLLDY 
201 LLGRASTLKD 
S51 ISLAGSNIFT 
301 IRFSKLCVSF 
351 FKFVAKTEDV 
M01 AARSFKRRPK 
MSI IIYCLVVTVdS 
501 YPALLTDIPV 
551 EIVCKCHKDE 
bOl SASPUALTIV 
t51 SLGNIEGGVA 
701 DLPSFGRVRE 
751 LPGTEflEMLY 
fiOl KPdGRLJIDDT 



LFDEAIKLGL 
PDPLETtfTGV 
SEIIITLLSN 
VUCDYRSEGld 
DfiKSRIEKNL 
IGVdDFVPFV 
NN(2EYN(3FCV 
GKKIEITSVD 
LPDNEVHUDS 
HEKTtflRDVK 
GDLHPGEtfLE 
TVTIETVFPF 
SSELflLAPSM 
TGHYIISWKR 
SLPVKYHL<2N 
NFYPLMAGYd 
SIAAA 



TAIGTtfNPGF 
LDFYGtfRSUR 
AVAlSFKKYKC 
WTLLTSVLTT 
INVLtlNESPD 
(3CKAKFHAPS 
IEEASKANEV 
LALGNETGRC 
IIIflASTMII 
LTAGLKPGC2D 
KMLYVRCGTV 
DVA VKFVSTK 
TTVDdLESdV 
TSANENIPII 
KTDLVflDVEI 
(2LPSLNINLL 



YY(3(3AAYYA(3 
QGILSFDLSD 
PRflKSHLPIVd 
ALKCSYLNAfl 
PEPDCDILAV 
FHVDVPVC2F3) 
LENLTtJGKMC 
VVLNW &GGGG 
SRVPNISVHL 
ANLTdKTHVT 
GSRNFLVYVS 
FEHLERVYAD 
DNVILdTGES 
TTVITLPHVI 
SVEPSDAFMF 
RFPNFTNdLL 



ERKdLAKTLC 
PEKEKVGILA 
MGEEYYYAKD 
LKDYITYSLE 
KTAfiKLUADR 
IYLKADCPHP 
LVPGKTRKLL 
DAASSOEALfl 
LHEPPALTNE 
LHGPELCDES 
YLINTTVEEK 
IPFLLMTDLL 
ASECFCLCCP 
VENIPLHVNA 
SGLKfllRLRI 
RRFIPTSIFV 



30 



35 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZpbtes3_141 7 -, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_1417-» frame 3 



40 



45 



50 



55 



Report for DKFZphtes3_lM17 . 3 



ELENGTH3 fl3fc> 
EHUD ^MSMT-SO 
Ipll 5. AM 

EH0N0L3 TREMBL : CEUBDM12_2 gene: "B0M12.3 n ^ Caenorhabdi tis 

elegans cosmid BDM12 • be-3D 
EKtO ■ Alpha_Beta 

IKW1 LOW_COMPLEXITY L50 ^ 

SEd HIDLCKKKIGSAELSFEHDAUriSKt3F(3AFGDLFDEAIKLGLTAI(3T(2NPGFYY(3(2AAYYA 

SEG xxxxxxxxx 

PRD ccceeeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeccccchhhhhhhhh 

SE<2 t3ERK(3LAKTLCNHEASVMYPNPDPLET(3TGVLDFYGr3RSUR(3GILSFDLSDPEKEKVGIL 

SEG x 

PRD hhhhhhhhhhhhhcceeeccccccccceeeeeeeeccccceeeceeeeeccchhhhhhhh 
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WO 01/98454 PCT/1B01/02050 

SE<2 AlflLKERNVVHSEIIITLLSNAVAflFKKYKCPRNKSHLMVOMGEEYYYAKDYTKALKLLD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeehhhhhhhhhhhh 

5 SEfl YVnCDYRSEGUUTLLTSVLTTALKCSYLnAtJLKDYITYSLELLGRASTLKDDflKSRIEKN 

SEG 

PRD hhhhccccccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhccccchhhhh 

SE£J LINVLNNESPDPEPDCDILAVKTA<2KLIJADRISLAGSNIFTIGV(2DFVPFV<2CKAKFHAP 

10 SEG 

PRD hheeeeccccccccccchhhhhhhhhhhhhhhhhhcccceeeeeeehhhhhhhhhhcccc 

SEC SFHVDVPV(2FDIYLKADCPHPIRFSKLCVSFNN(2EYN<2FCVIEEASKANEVLENLT<3GKN 

SEG 

15 PRD eeeeeeecceeeeeecccccceeeeeeeeecccccccceeeeeccccchhhhhccccccc 

SE<3 CLVPGKTRKLLFKFVAKTEDVGKKIEITSVDLALGNETGRCVVLNWCJGGGGDA ASSflEAL 

SEG 

PRD cccccccccchhhhhhhhhccceeeeeeeecccccccccceeeeeecccccccchhhhhh 

20 

SEfl QAARSFKRRPKLPDNEVHUDSIIIflASTrillSRVPNISVHLLHEPPALTNEriYCL VVTVfl : 

SEG -- 

PRD hhhhhhhhcccccccccccceeeeeeeceeeeccccceeeeeccccccccceeeeeeeee 

25 SEfl SHEKTtJIRDVKLTAGLKPG(3DANLT(2KTHVTLHGPELCDESYPALLTDIPVGDLHPGE(JL 

SEG 

PRD cccccceeeeeccccccccchhhhhheeeeecccccccccceeeeecccccccccccchh 

SEtl EKriLYVRCGTVGSRriFLVYVSYLINTTVEEKEIVCKCHKDETVTIETVFPFDVAVKFVST 

30 SEG 

PRD hhhhhhcccccchhhhhcchhhhhccccceeeeeeecccccceeeeeeccceeeeeeeeh 

SE(2 KFEHLERVYADIPFLLf1TDLLSASPUALTIVSSEL(3LAPSI1TTVD(2LESd2VDNVIL(2TGE 

SEG 

35 PRD hhhhhhhhhhccceeeehhhhhccccceeehhhhhhhhccceeeccccccccceeeeccc 

SEfl sasecfclocpslgnieggvatghyiisljkrtsahenipiittvitlphviveniplhvn 

SEG 

PRD cceeeeeeeecccccccccccceeeeeeeeccccccccceeeeeeeeeeeeeeecccccc 

40 

SE<2 ADLPSFGRVRESLPVKYHLf2NKTDLV<JDVEISVEPSDAFMFSGLK<2IRLRILPGTE(JEI1L 

SEG 

PRD cccccccceeeccceeeeeecccccceeeeeecccccceeeccccccceeeccccccccc 

45 SE<2 YNFYPLMAGYflOLPSLNINLLRFPNFTNflLLRRFIPTSIFVKPflGRLMDDTSIAAA 

SEG 

PRD cccccccccccccccccccccccccccchhhhhcccceeeeecccccccccccccc 

50 (No Prosite data available for DKFZphtes3_mi7 • 3) 

(No Pfam data available for DKFZphtes3_mi7 .3) 
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WO 01/98454 
DKFZphtes3_15nm 



PCT/IB01/02050 



5 group: testis derived 



DKFZphtes3_15nm encodes a novel 713 amino acid protein with weak 
similarity to the neurofilament triplet D protein of the rat- 
io Neurofilaments are the intermediate filaments specific to nervous 
tissue* They are probably essential to the tensile strength of 
the neuroni as well as to transport of molecules and organelles 
within the axon- Until now-. ESTs of the novel mRNA could only be 
isolated from testes-i germ cells and uterus- 
15 No informative BLAST results^ No predictive prositei pfam or SCOP 
motif e • 



The new protein can find application in studying the expression 
profile of testis-specif ic genes- 

20 



similarity to neurofilament triplet M protein - rat 



few EST hits (b of T hits from testis) 
25 perhaps complete cds- 



Sequenced by GBF 
Locus- unknown 

30 

Insert length: E3B^ bp 

Poly A stretch at pos- 53Efl-t pplyadeny lat ion signal at pos- E30b 



35 1 TGGGCCCCAC CTCCTCAGCA CAACTTTCTG AAAAACTGGC AGCGTAACAC 

51 AGCCCTGCGG AAGAAGCAGC AGGA AGCCCT CAGCGAACAC CTAAAGAAGC 

1D1 CAGTGAGTGA GCTGCTCATG CACACCGGGG AGACCTACAG ACGGATCCAG 

151 GAGGAGCGGG AGCTCATTGA CTGCACACTT CCAACCCGGC GTGATAGGAA 

201 AAGCTGGGAG AACAGTGGGT TCTGGAGTCG ACTGGAATAC TTGGGAGATG 

40 E51 AGATGACAGG TCTGGTCATG ACCAAGACAA AAACTCAGCG TGGCCTCATG 

3D1 GAGCCCATCA CTCATATCAG GAAGCCCCAC TCCATCCGGG TGGAGACAGG 

351 ATTACCAGCC CA6AGGGACG CTTCATACCG CTACACCTGG GATCGGAGTC 

HOI TGTTTCTGAT CTACCGACGC AAGGAGCTGC AGAGAATCAT GGAAGAGCTG 

i*51 GATTTCAGCC AGCAGGATAT TGATGGCCTG GAGGTGGTGG GCAAAGGGTG 

45 SD1 GCCCTTCTCG GCTGTTACTG TGGAAGACTA CACAGTGTTT GAAAGAAGTC 

551 AGGGAAGCTC CTCTGAAGAC ACAACATACT TAGGCACATT GGCCAGTTCC 

fc.01 TCTGATGTCT CCATGCCTAT TCTCGGCCCT TCTCTGCTGT TCTGTGGGAA 

b51 GCCAGCTTGC TGGATCAGAG GCAGTAATCC ACAGGACAAG AGGCAGGTTG 

701 GGATTGCTGC TCACTTGACC TTTGAAACCC TAGAAGGCGA GAAAACCTCC 

50 751 TCAGAACTGA CTGTGGTCAA TAATGGCACC GTGGCCATTT GGTATGACTG 

flOl GCGACGGCAG CACCAGCCGG ACACTTTCCA AGACCTTAAG AAAAACAGGA 

A51 TGCAGCGATT TTAGTTTGAC AACCGGGAAG GTGTGATTCT GCCTGGAGAA 

^01 ATTAAAACAT TTACCTTCTT CTTCAAGTCT TTGACTGCTG GGGTCTTCAG 

=151 GGAATTTTGG GAGTTTCGAA CCCATCCTAC TCTATTAGGA GGTGCTATAC 

55 1001 TGCAGGTCAA TCTCCACGCG GTCTCCCTGA CCCAGGACGT TTTTGAGGAT 

1051 GAGAGGAAAG TACTGGAGAG CAAGCTGACT GCCCATGAGG CAGTCACCGT 

1101 CGTTCGCGAA GTGCTGCAGG AGCTGCTGAT GGGGGTCTTG ACCCCGGAGC 

1151 GCACACCATC ACCTGTGGAT GCCTATCTCA CCGAGGAAGA CTTGTTCCGG 
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1ED1 CACAGAAATC CTCCGCTGCA TTATGAGCAC CAAGTGGTGC AAAGCCTGCA 

1ES1 CCAACTGTGG CGCCAGTACA TGACCCTGCC CGCCAAG6CT GAGGAGGCCA 

1301 GGCCAGGGGA CAAGGAGCAC GTCAGCCCCA TAGCCACAGA GAAGGCCTCT 

1351 GTGAATGCTG AGCTGTTACC ACGCTTTAGG AGCCCCATCT CCGAA ACTCA 

5 mDl AGTGCCCCGG CCTGAGA ACG AGGCCCTCAG GGAATCCGGG TCCCAGAAGG 

1M51 CCAGAGTGGG GACCAAGAGT CCTCAGCGGA AGAGCATCAT GGAGGAGATC 

1501 CTGGTGGAGG AAAGCCCAGA TGTGGACAGC ACCAAGAGCC CCTGGGAGCC 

1SS1 6GATGGCCTT CCCCTGCTGG AGTGGAACCT CTGCTTGGAG GACTTCAGAA 

IbDl AGGCAGTGAT GGTGCTCCCT GATGAGAACC ACAGAGAGGA TGCGTTGATG 

10 ItSl AGGCTCAACA AAGCAGCCCT GGAGCTGTGC CAGAAGCCAA GGCCATTGCA 

1701 GTCCAACCTC CTGCACCAGA TGTGTTTGCA GCTGTGGCGA GATGTGATTG 

1751 ACAGCCTGGT GGGCCATTCC ATGTGGCTGA GGTCT6TGCT GGGCCTGCCT 

IflDl GAGAAGGAGA CCATCTATTT GAATGTGCCT GAAGAGCAAG ATCAAAAATC 

IflSl ACCTCCTATC ATGGAAGTGA AGGTACCTGT GGGGAAAGCT GGGAAGGAGG 

15 MDl AGCGGAAAGG AGCAGCCCAG 6AAAAGAAGC AACTGGGGAT CAAAGACAAA 

MSI GAAGACAAGA AAGGAGCCAA GCTGCTCGGG AAAGAGGACC GTCCCAACAG 

S0D1 CAAGAAGCAC AAGGCAAAGG ATGACAAGAA AGTCATAAAA TCTGCAAGTC 

5051 AGGACAGGTT TTCTTTGGAA GACCCTACCC CTGACATCAT CCTCTCTTCT 

E101 CAAGAACCCA TAGACCCCCT GGTCATGGGG AAATACACCC AGAG6CTGCA 

20 E1S1 CAGTGAGGTC CGTGGGCTGC TGGACACCCT GGTGACCGAC CTGATGGTCC 

ES01 TGGCTGATGA GCTCAGCCCC ATA AAGAAT6 TCGAG6AGGC TTTGCGCCTC 

EE51 TGCAGGTGAC TCTCGGGCCC AAGCAACCTT CTGGAAAACG GGTTAATAAA 

E3D1 TAAATCAATA AAGAACCTTC AAGTTTCTAC TAAAAAAAAA A AAAAA AAAA 

S351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA GGGCGGCCG 

25 

BLAST Results 

30 No BLAST result 

Medline entries 

No Medline entry 



35 



40 Peptide information for frame 1 

ORF from 115 bp to EE5b bpi peptide length: ?13 
Category: putative protein 
45 Classification: Cell structure/motility 

1 MHTGETYRRI (2EERELIDCT LPTRRDRKSU ENSGFUSRLE YLGDEMTGLV 

51 MTKTKTflRGL MEPITHIRKP HSIRVETGLP AfiRDASYRYT WDRSLFLIYR 

1D1 RKEL<2RIMEE LDFSCJQDIDG LEVVGKGUPF SAVTVEDYTV FERSflGSSSE 

50 151 DTTYLGTLAS SSDVSMPILG PSLLFCGKPA CWIRGSNPtJD KRQVGI AAHL 

B01 TFETLEGEKT SSELTVVNNG TVAIUYDURR <2H<2PDTFfll>L KKNRMfiRFYF 

E51 DNREGVILPG EIKTFTFFFK SLTAGVFREF UEFRTHPTLL GGAILflVNLH 

3D1 AVSLTflDVFE DERKVLESKL TAHEAVTVVR EVLfiELLMGV LTPERTPSPV 

351 DAYLTEEDLF RHRNPPLHYE H<2VV<3SLH<JL URtJYMTLPAK AEEARPGDKE 

55 M01 HVSPIATEKA SVNAELLPRF RSPISETOVP RPENEALRES GS<3KARVGTK 

M 51 SPI2RKSIMEE ILVEESPDVD STKSPUEPDG LPLLEUNLCL EDFRKAVMVL 

5D1 PDENHREDAL MRLNKAALEL CfiKPRPLcJSN LLHflMCLflLU RDVTDSLVGH 

551 SMULRSVLGL PEKETIYLNV PEECPflKSPP IMEVKVPVGK AGKEERKGAA 
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bOl (3EKK<3LGIKD KEDKKGAKLL GKEDRPNSKK HKAKDDKKVI KSASGDRFSL 

bSl EDPTPDIILS SflEPIDPLVM GKYTflRLHSE VRGLLDTLVT DLMVLADELS 
?□! PIKNVEEALR LCR 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15nm -, frame 1 
No Alert BLASTP hits found 
15 Pedant information for DKFZphtes3_lSnm-i frame 1 



10 



20 



Report for DKFZphtes3_15nm - 1 



IELEN6TH3 713 

EMIO fil7flD.S3 

CBL0CKS3 PFDDflTflC 
25 IEBL0CKS3 BLDOt^DC DEAH-box subfamily ATP-dependent helicases 
proteins 

EKUO Alpha_Beta 

EKU1 LOU^COMPLEXITY M-07 X 

30 

SEt2 nHTGETYRRItJEERELIDCTLPTRRDRKSUENSGFUSRLEYLGDEnTGLVflTKTKTflRGL 
SEG 

PRD ccchhhhhhhhhhhhhhhhccccchhhhhhccccccceeeeccccceeeeeecccccccc 

35 SE<3 MEPITHIRKPHSIRVETGLPAtJRD ASYRYTL)DRSLFL1YRRKEL(3RIMEELDFS(3<3DIDG 
SEG 

PRD cccccccccccceeeeeccccchhhhhhhcccchhhhhhhhhhhhhhhhhhcccccccce 

SE<2 LE VVGKGUPFSAVTVEDYTVFERStfGSSSEDTTYLGTLASSSDVSflPILGPSLLFCGKPA 
40 SEG 

PRD eeeeeccccceeeeecceeeeeecccccccceeecccccccccccccccccceeeecccc 

SE<3 CUIRGSNP(2DKRr2VGIAAHLTFETLEGEKTSSELTVVNNGTVAIUYDliIRR(3H(2PDTFt3DL 
SEG 

45 PRD eeeeccccccchhhhhhhhhheeecccccccceeeeecccceeeeehhhhhccccchhhh 

SE<3 KKNRNfiRF YFDNREGVILPGEIKTFTFFFKSLTAGVFREFWEFRTHPTLLGGAILt3VNLH 
SEG 

PRD hhhhhhhhhcccccccccccceeeeeeeehhhhhhhhhhhhhhhcccccccchhhhhhhh 

50 

SEC2 AVSLTtfDVFEDERKVLESKLTAHEAVTVVREVLGELLMGVLTPERTPSPVDAYLTEEDLF 

SEG 

PRD hhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccceeeeccccc 

55 SEG RHRNPPLHYEH(3 VV(3SLH(3LlilR(3YnTLPAKAEEARPGDKEHVSPIATEKASVNAELLPRF 
SEG 

PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhccccccccccccchhhhhhhhhccccc 
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SE<2 RSPISETflVPRPENEALRESGSflKARVGTKSPfiRKSIIIEEILVEESPDVDSTKSPUEPDG 

SEG , 

PRD cccccccccccccchhhhhcccccccccccccchhhhhhhhhhhcccccccccccccccc 

5 SEfl LPLLEWNLCLEDFRK AVnVLPDENHREDALnRLNKAALELC(3KPRPL(3SNLLH(2nCLt3LUI 

SEG 

PRD ccccchhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhcccccchhhhhh 

SEt2 RDVIDSLVGHSnWLRSVLGLPEKETIYLNVPEEt3D(2KSPPII1EVKVPVGKAGKEERKGAA 

10 SEG 

PRD hhhhhhhhccchhhhhhccccccceeeeecccccccccccceeeeeccccchhhhhhhhh 

SE<2 (3EKKc3LGIKDKEDK:KGAKLLGKEDRPNSK:KHKAKDDKKVIKSAS(3DRFSLEDPTPDIILS 



20 



25 



SEG 



XXXX XX XXX XX XX XX XX . 



15 PRD hhhhhhccccccccchhhhhcccccccccccccccccGeeeecccccccccccccceeee 
SEfl S(3EPIDPL VMGICyTt3RLHSEVRGLLDTLVTDLMVLADELSPIKNVEEALRLCR 



• xxxxxxxxxxxx . 



SEG 

PRD ccccccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhccc 

(No Prosite data available for DKFZphtes3_15nl4 . 1 ) 
(No Pfam data available for DKFZphtes3_JL5nm • 1 ) 
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DKFZphtes3_lbb5 



PCT/IB01/02050 



5 group: cell structure and motility 

DKFZphtes3_lfc,b5 encodes a novel 5bfl amino acid protein with 
similarity to various tropomyosins - 

10 Tropomyosins play regulatory roles in cellular structure and 
transport • 



15 



20 



The new protein can find application in modulating cell structure 
and motility as well as modulationg cellular transport- 

weak similarity to KIAAD77M 
perhaps complete cds- 
Sequenced by BNFZ 
Locus: unknown 



25 Insert length: 131b bp 

Poly A stretch at pos - 12M7-I polyadenylat ion signal at pos- 1232 



1 TGCTAA AATG G AATTAGAGA GAAGCATAGA CATCAGCAGA AGACAGAGTA 

30 51 AGGAGCACAT ATGTAGAATT ACAGATCTAC AAGAGGAATT AAGACAC AGA 

1D1 GAGCATCACA TCTCTGAATT GGATAAGGAG GTTCAGCACC TTCATGAGAA 

151 TATAAGTGCC CTAACCAAAG AACTGGAATT TAAGGGGAAA GAAATTCTC A 

201 GAATACGAAG TGAATCTA AC CAACAGATAA GGTTGCATGA ACAAGATTTA 

E51 AACAAGAGAC TTGAAAAAGA GTTGGATGTC ATGACAGCAG ACCACCTCAG 

35 301 AGAGAAAAAT ATCATGCGGG, CAGATTTTAA TAAGACTAAC GAGCTACTCA 

351 AGGAAAT A AA TGCCGCTTTA CAAGTGTCAT TAGAAGAAAT GGAAGAAAAA 

M01 TATCTAATGA GAGAATCAAA ACCAGA AGAT ATACAGATGA TTACAGAATT 

451 AAAAGCCATG CTTACAGAAA GAGACCAGAT CATAAAGAA A CTAATTGAGG 

501 ATAATAAGTT TTATCAGCTG GAATTAGTCA ATCGAGAAAC TAACTTCAAC 

40 551 AAAGTGTTTA ACTCAAGTCC TACTGTTGGT GTTATTAATC CATTGGCTAA 

bOl GCAAAAGAAG AAGAATGATA AATCACCAAC AAACAGGTTT GTGAGTGTTC 

t51 CCAATCTAAG TGCTCTGGAA TCTGGTGGAG TGGGCAATGG ACATCCTAAC 

701 CGCCTGGATC CCATTCCTAA TTCTCCAGTC CACGATATTG AGTTCAACAG 

751 CAGCAAACCA CTTCCACAGC CAGTGCCACC TAAAGGGCCC AAGACATTTT 

45 SOI TGAGGTATCA GTAAGATGCA TGTGCATGAG CTCAAGGAAC ATGACTACTG 

551 GAGTTTCCAT TACACATTGT TGCGTGCCTT GTAATTTTCC CCAAAGACGT 

=101 CCTGCTCAGA GTGAAGCTTC TCCAGTGGCT TCTCCAGATC CCCAGCGCCA 

151 GGAGTGGTTT GCCCGGTACT TCACATTCTG AAAGA ATTGT GTTGGCACAG 

1001 CTCTGTATAG ACTGTTACTA AGAGCATGAC TTTATACAGA TTGTTATGTA 

50 1051 AATAGGCTTT CCTATGTCAA ACACTGTGAA TGAGAAAGTA TTTGTCTCTC 

1101 CAACTTGA AA ATGCACTGTA TTTCCTGTGA TATTTATTGG AATCATTCTA 

1151 TAAGGTACTA TATTATGTGT GTAATTATAA CTGTTATTTT TATTTGAGAT 

1BD1 GGAAGAGTCT TTAACCTTTG TAATTACTGC ATAATAAATT TTGTTAGAAT 

1251 CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

55 1301 AAAAAAAAAA AAAAAA 



BLAST Results 
-314- 



WO 01/98454 



PCT/1B01/02050 



10 



No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 2 



15 

ORF from a bp to fill bp=i peptide length: Ekfi 
Category: similarity to known protein 
Classification: Cellular transport and traffic 

20 1 MELERSIDIS RR<3SKEHICR ITDLtfEELRH REHHISELDK EVfiHLHENIS 

51 ALTKELEFKG KEILRIRSES N(2flIRLHEt3D LNKRLEKELD VMTADHLREK 

101 NIMRADFNKT NELLICEINAA LfiVSLEEMEE KYLMRESKPE DK3MITELKA 

151 MLTERDOIIK KLIEDNKFYfl LELVNRETNF NKVFNSSPTV GVINPLAKdK 

201 KKNDKSPTNR FVSVPNLSAL ESGGVGNGHP NRLDPIPNSP VHDIEFNSSK 

25 B51 PLPdPVPPKG PKTFLRYtf 



BLASTP hits 

30 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lbb5-» frame 2 
35 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lbb5-i frame 2 

40 Report for DKFZphtes3_lbb5 . 2 

ELENGTHJ 270 

eiiio am^-m 

45 • CpO b-^D 

EH0M0L3) PIR:A57013 early endosome antigen 1 - human le-D5 

EFUNCATJ 03-1*1 recombination and dna repair IS- cerevisiae-i 

Y0L03HW3 le-05 

EFUNCATJ 03-22 cell cycle control and mitosis ES. cerevisiae-i 
50 YFR031cID 2e-05 

EFUNCAT3 30.10 nuclear organization IS- cerevisiaen YFRD31cH 
2e-0S 

CFUNCATJ 11-0*4 dna repair (direct repairi base excision repair 
and nucleotide excision repair) IS. cerevisiae-i YKROTSwU 5e-05 
55 CFUNCATJ 3D.0H organization of cytoskeleton ES- cerevisiae-. 

YI>R35bw3 7e-05 

EFUNCATID 0^-10 nuclear biogenesis HIS- cerevisiae-. YDR35bw3 

?e-05 
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EFUNCAT3 Ofl.07 vesicular transport (golgi network-, etc) ES- 
cerevisiae-. YDLDSfiwJ le-04 

EFUNCATJ 30-03 organization of cytoplasm ES- cerevisiae-i 

YDLOSflwID le-04 

5 EFUNCATJ 1 genome replication! transcription i recombination and 
repair EM - jannaschii-. I1JlbM33 2e-04 

EFUNCAT3 1*1 unclassified proteins ES- cerevisiae-. YLR3DTcl 

3e-04 

EFUNCAT3 Ofl-lt extracellular transport ES- cerevisiae-. 

10 YNL572c3 Se-04 

EFUNCATJ 30.0*1 organization of intracellular transport vesicles 

ES- cerevisiae-. YNL27Ec3 Se-04 
EKIO All_Alpha 

EKU3 L0W_C0I1PLEXITY 4-61 V. 

15 EKIO C0ILED_C0IL 10.74 '< 

SE(3 AKMELERSIDISRRflSKEHICRITDLtfEELRHREHHISELDKEVflHLHENISALTKELEF 

SEG 

20 PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCC 

SEfl KGKEILRIRSESN£3(3IRLHEODLNKRLEKEL])VnTADHLREKNIHRADFNKTNELLKEIN 

25 SEG 

PRD hhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

ccccc 

30 SE<3 AALf3VSLEEnEEKYLnRESKPEDIt2niTELKAriLTERDt2IIKKLIEDNKFYt!LELVNRET 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhn 
COILS 



35 



40 



SEC NFNKVFNSSPTVGVTNPL AK<3KKKNDKSPTNRFVSVPNLSALESGGVGNGHPNRLDPIPN 

SEG 

PRD hhhhhhhcccceeeehhhhhhhhhhccccccceeeccccccccccccccccccccccccc 
COILS 



SE<2 SPVHDIEFNSSKPLPfiPVPPKGPKTFLRYfl 

SEG xxxxxxxxxxxxx - 

PRD ccceeeeecccccccccccccccceeeccc 
45 COILS • 



(No Prosite data available for DKFZphtes3_lfc>b5 -2) 
50 (No Pfam data available for DKFZphtes3_lbb5. E) 
DKFZphtes3_lfep3 



55 



group: testis derived 
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WO 01/98454 ^PCT/IBOl/02050 

DKFZphtes3_lbp3 encodes a novel lbfei3 amino acid protein without 
similarity to known proteins. 

The novel protein is glutamine rich and contains a cell 
5 attachment R6D motif. According to the low number of ESTs and 
their origin the protein seems to be expressed ubiquitously at 
low levels. 

No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e • 

10 

The new protein can find application in studying the expression 
profile of testis-specif ic genes- 



is putative protein 

perhaps complete cds- 
Sequenced by BflFZ 

20 

Locus: unknown 

Insert length: Sill bp - 

Poly A stretch at pos. 535Mi polyadenylation signal at pos- 5340 

25 



1 GGCGGCCAGG TGGAGGACCT 
SI GGTGCAGGGC ATCGCCACGC 
101 TTGACCTGGC CGCGCTAGAG 
30 151 GCGTTCGATA GGGTGCGGAC 

EDI GCTCAGCTTT GCCAGGGTAC 
551 TATTCAAAGA TCGGGAGCAA 
301 TTGGTTCCTG GTGCAGAAGA 
351 GCAGGCGATT ACGGACGGCT 
35 M01 TTATGGGATT TTCTAAGCAC 

M51 GGGACTCTAA GCGGAGACTC 
501 GGATTCTGCC AGTGGTCTTG 
551 GCACAGCACA CCCCTCTGAT 
bOl CCCTCTGGTA CTGGGAGACA 
40 b51 CGTGCCACGA CTCCATCAGT 

7D1 ATCGTCACAG GAGTAGAGAG 
751 GCACGTCCTG GTCCAGTTCA 
SOI CAGTAGTGTG CCCGCTAGCC 
flSl GTGGGTTAGA ACCAACTGGC 
45 101 ACTTACCCAC ATGGTGTGGT 
151 ACCACCTGAA ATGGATGATC 
1001 AACGTATGTT GCCACCATCA 
1051 CTACCTAGCA CAGACCAACA 
1101 TGGTATGACA TTTCCTGGCA 
50 1151 TGGATCAGCG TGGATGTGTA 
1501 CCCCCTGGTA TAGACCAGCA 
1551 TGGCCTGGTT CTACCTTTTA 
1301 TGATGCCAAT TAGTGCAGAT 
1351 GCAACTGGCT TCATACAACC 
55 1M01 TGGCAGATTT CAGCGTGCTT 
1M51 TGGTCCA ACC TGGTGCAGAT 
1501 CAGTCT6GTT TG6CCCAACC 
1551 TGGAATGGAT CAGTCTGGTT 



GAGCAAGCAG CTCAAGCGTG TGGACGGCCA 
ACGTGCAGCA CTTCTCCCAG GCCAGCGGGC 
TGGCCGGAGG A6CAGGAGGT GGGCGTGCGG 
TGGGAGTATC ATGAAGGACG CCGCCGAGGA 
TTTTACAGCG GGTTGATGAA CTAGAGAAGC 
TTCCTGGAAC TAGTCAGCCG GAAGCTGAGT 
AGTCACCATG GTCACCTG6G AAGAGCTGGA 
GGAGAGCCTC ACAAGCGGGC TCAGAAACAC 
GGAGGGTTCA CTTCCTTAAC ATCACCTGAA 
TACCA AGCAA CCAAGTATTG AGCAGGCTCT 
GCGCGGATCG GACTGCATCA GGATCTGGTG 
GGGGTTTCCA GTAGGGAACA AAGCAAGGTC 
GCAGCAGCCG AGGGCCCGTG ATGAAGCTGG 
CTTCTACATT CCAATTCAAA TCAGACTCAG 
AAGCTTACCT CGACACAACC AAGAAGAAAT 
ACAGGACTTA CCCTTGGCCA GAGACCAGCC 
AGAGTCAGGT CCATCTAAGG CCAGATCGTC 
ATGAATCAGC CTGGATTAGT GCCTGCTAGC 
ACCCCTCAGC ATGGGTCAGC TTGGTGTGCC 
GGGAATTGAT ACCATTTGTC GTGGAT6A6C 
GTACCTGGCA GAGACCAGCA AGGATTGGAA 
TGGTCTGGTT TCAGTCAGTG CATATCAGCA 
CAGACCAACG CAGTATGGAA CCACTTGGCA 
ATATCAGGCA TGGGTCAGCA AGGACTAGTA 
AGGATTGACA TTGCCTGTCG TCGATCAACA 
CAGACCAGCA TGGTTTGGTA TCACCTGGTT 
CAGCAAGGTT TTGTGCAGCC CAGTTTGGAA 
TGGCACAGAG CAGCATGATT TGATCCAGTC 
TGGTGCAGCG TGGTGCATAT CAGCCTGGCT 
CAGCGTGGTT TGGTCCGGCC TGGAATGGAT 
TGGTGCAGAT CAGCGTGGTT TGGTCTG6CC 
TGGCCCAACC TGGTAGAGAT CAGCATGGTT 
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IbOl TGATCCAGCC TGGCACAGGT 
IbSl CAGGGTGTCT TGGTACAGCC 
1701 TGGCAGATTT CAGCGTGCTT 
1751 TGGTCCAACC TGGTGCAGAT 
5 IflOl CAGCATGGTT TGGTACAATC 
1851 TGGTGCAGTT CAGCATGGTT 
1101 TGGCACAACC TCGTGCAGAT 
1151 GATCAGCGTG GTTTGGTCCA 
S0Q1 ACCTGGAGTG GATCAGCATG 

10 2051 GTTTGGTGCA ACCTGGTATA 
21Q1 GTTCAGCGTG GTTTGGTGCA 
2151 ACCTGGAGTG GATCAGCGTG 
2201 GTTTGGTCCA ACCTGGTGCA 
2251 GATCAGCGTG GTTTGGTCCA 

15 2301 ACCTGGAGTG GATCAGCGTG 
2351 GTTTGATCCA ACCTGGTGCA 
2401 GGTCAGCTGG GTATGGTGCA 
2451 ACCTCAGGCA GATCCACATG 
2501 GTTTGGTACA ACCTGGTGCA 

20 2551 TATCCACGTG GTCTGGTGCA 
2b01 ACCTGGTGCA TATCAGCCAG 
2bSl GCTCTTCAAC ATTCCAGGCA 
2701 TATCAACATG GTATGGTACC 
2751 ACCACTCCTA GCCAGTCAAG 

25 26D1 GTTTGGTACC ACCAGAAACT 
2651 GACCAGCACA GCCCAATACC 
2101 AGATCAACAG CATGTGGCAT 
2151 ACCCAGATGC AGCTCAGCAT 
3DD1 GATTCAATGT ATCCTGGTTA 

30 3051 GCATGGCCAG GAAGGTTTGG 
3101 ATGGAATTCC TGCCCAGAAG 
3151 AGTCCAGACT CCGTCGACCG 
3201 TGAAGTCCT6 AGTGAGCGAC 
3251 TCCCCACGGC AGTGGAGACA 

35 3301 TATGTGGGGC TAAAGGAGA6 
3351 CCAAACCGAC TTGGAGAAGA 
3401 GGACCATACC TCCTGAACTG 
3451 GCCAAAGAAG TTT6GCAGGA 
35D1 CCTGGAAGGG GAAGGGAATC 

40 3551 AGCTGAGATT GCAGCTGGGT 
3b01 AAGGAGCTGG CCGAGTTGAG 
3b51 GGAAAATTCT GTCTCTGAAG 
37D1 AGCTCAGGAT GATCATTGAG 
3751 TCCATGAGCA TGGCCCCGCA 

45 3A01 CGACCCTGAG GCCACCTGTC 
3flSl TCAGCACGCT GGTGCGGC6C 
3101 CTGGCCGTCT CCCGACCCTC 
3151 GGAGCTGCTG GGCCGTGTGC 
4001 GCGAGAA6CT CAACATCACC 

50 4051 AAACAGAAGG ACATTGCTAT 
4101 GGAAAAGGCC AACAGGGAGC 
4151 ACAAGAGTGC TCTGGCCACC 
4201 ACGGAGCAGC TGAACCACAT 
4251 GCAGGAGCAG GACTGGCAGA 

55 4301 ACAACAAGCT GGACCGCCTG 
4351 GATCGGTGGA AATCGCTGCG 
44D1 CCAGGCAGAC GAGGCGGCTG 
4451 ACTGCCTCTC ATGTGACCGG 



CAGCATGATT TGGTCCAATC T6GCACAGGT 

TG6TGTAGAT CAGCCTGGCA TGGTCCAACC 

TGGTGCAGCC TGGTGCATAT CAGCCT6GCT 

CAGATTGATG TGGTGCAACC TGGTGCAGAT 

TGGTGCAGAT CAGAGTGATT TGGCTCAACC 

TGGTCCAACC TGGAGTAGAT CAGCGTGGTT 

CATCAGCGTG GTTTGGTCCC ACCTGGTGCA 

ACCTGGTGCA GATCAGCATG GTTTGGTCCA 

GTTTGGCACA ACCTGGTGAA GTTCAGCGTA 

GTTCAGCGTG GTTTGGTGCA ACCTGGTGCA 

ACCTGGTGCA GTTCAGCGTG GTTTGGTCCA 

GTTTGGTTCA ACCTGGTGCA GTTCAGCGTG 

GTTCAGCATG GTTTGGTCCA ACCTGGTGCA 

ACCTGGAGTG GATCAGCGTG GTTTGGTGCA 

GTTTGGTCCA ACCTGGAATG GACCAGCGTG 

GATCAGCCTG GTTTGGTCCA GCCTGGTGCA 

GCCTGGAATA GGTCAGCAAG GTATGGTGCA 

GCCTG6TACA ACCTGGTGCC TATCCTCTTG 

TATTTGCATG ATTTATCTCA ATCTGGGACA 

GCCAGGAATG GATCAGTATG GTTTGAGACA 

GCTTGATAGC ACCAGGCACA AAGCTTCGTG 

GATTCTACAG GTTTTATATC AGTACGTCCA 

TCCTGGCAGA GAACAATACG GCCAGGTGTC 

GTTTGGCATC ACCTGGTATA GATCGAAGGA 

TATCAGCAAG GTTTGATGCA TCCTGGCACA 

ACTGAGTACA GGTTTGGGAT CTACACACCC 

CACCTGGCCC AGGTGAGCAT GACCAGGTAT 

GGCCATGCTT TCTCTCTCTT TGACAGTCAT 

TCGTGGCCCA GGGTATCTAA GTGCTGATCA 

ATCCAAATAG AACACGAGCC TCGGACCGAC 

GCCCCAGGCC AAGATGTCAC TCTTTTCAGG 

AGTCTTATCA GAAGGG AGCG AAGTCTCGAG 

GCAATTCACT GCGTAGAATG AGTTCTAGTT 

TTTCATCTGA TGGGAGAGCT CAGTAGCCTC 

TATGAAGGAT CTGGATGAGG AGCAGGCCGG 

TCCAGTTCCT GCTGGCACAG ATGGTCAAAA 

CAGGAGCAGC TGAAGACCGT AAAGACGCTA 

GAAAGCAAAA GTGGA A AGGC TGCAGAGGAT 

AAGAAGCAGG GAAGGAACTG AAAGCTGGAG 

GTCCTCAGAG TCACCGTGGC TGACATAGAA 

GGAGAGCCAA GACAGGGGCA AGGCTGCCAT 

CCTCCCTTTA CCTGCAGGAC CA6TTGGACA 

AGCATGCTGA CCTCCTCCTC CACGCTCCTG 

CAAGGCCCAC ACCTTGGCTC CTGGCCAGAT 

CAGCCTGCAG CCTGGATGTG AGCCATCAGG 

TATGAGCAAC TCCAAGACAT GGTCAACAGC 

CAAGAAGGCC A AGCTCCAGA GACAGGACGA 

AGAGTGCCAT CCTGCAGGTG CAGGGTGACT 

ACCAGCAACC TCATCGAGGA CCATCGGCAG 

GCTGTACCAG GGTCTGGAGA AGCTCGAAAA 

ACCTGGAGAT GGAGATCGAT GTGAAAGCCG 

AAAGTGAGCC GTGTCCAGTT TGATGCCACC 

GATGCAGGAG CTGGTGGCCA AGATGAGCGG 

AGATGCTGGA CAGGCT6CTC ACAGAGATGG 

GAGCTGGACC CAGTGAAGCA GTTGCTGGAG 

ACAGCAGCTC AGGGAGCGCC CCCCACTCTA 

CCATGCGGAG GCAGCTCCTG GCACATTTCC 

CCCTTGGAGA CACCTGTGAC TGG ACATGCC 
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WO 01/98454 

4501 ATCCCCGTGA 
45S1 CCCCTACACG 
MbDl TCAAGCTGGG 
MbSl AGCGTGGGGC 
5 4701 GAAGGT6CAG 
4751 TCCGCGAGCT 
4SD1 GTGACAGATA 
4B51 GGGCAGCCAC 
4101 TTCCCCGGGG 

10 4151 GATGAGGTGG 
SDD1 GGACACAAGG 
5051 AGCGCA AGTC 
5101 AGCAGCAATG 
5151 TGGCAACACC 

15 SEQ1 CCGAACACAG 
SSS1 GGAGAGGCAC 
5301 GGGGGCCGCG 
5351 CAGGAGGAAA 
54D1 AAAAAAAAAA. 

20 



CCCCCGCGGG TCCAGGCCTA 
GTGTTTGAAC TGGAGCAGGT 
CAGCGCCTTC CCTCGGGGTG 
GCCTGCGCTC CATGCACTCC 
ATCCACTTCG GGGGCTCCAC 
GCTGCACGCC CAGTGCCTGG 
TGGCTGATTA CACCTACTCA 
ACCCTCACCT ACCCCTACCA 
CCTGTATCCT ACTGAAGAGA 
ACATCTTGGG CCTGGATGGC 
CTGCCAGGCA TCCTCCGAAA 
CCAGCAGCCC AGGCCCCACG 
GCCAGCTGCC CTCTCGGCCA 
TCAGAAAGAT AGACCTTCCT 
CCCACCCGCC CAGCTCCGCC 
GTGGACATGC CTCCTGGGGA 
GTCCAGCACC GCTCAGTGAG 
AAAAAAAAAA AAAAAAAAAA 
A 



BLAST Re 



PCT/TB01/02050 

CCTGGGCACC ATTCCATCCG 
CCGGCAGCAT AGCCGCAACC 
ACCTGGCGCA 6ATGGAGCAG 
AAGATGCTGA TGAAC ATTGA 
CAAGGCCAGC AGCCAGATAA 
GCTCCCCCTG CTACA AACGG 
ACTGTGCCCC GGCGC1GCGG 
CCGCAGCCGC CCGCAGCACC 
TCCAGATTGC CATGAAGCAT 
CACATTTACA AGGGACGGAT 
AGACAGCTCA GGGACCTCAA 
TGCACAGGCC GCCATCCCTC 
CAGAGCGCCC AGATTTCGGC 
CCGAGGGCCG TCTCTCCCAG 
TCGGTG6CAA ACAGGGGGCT 
GGGGCTCGAG GAGCCCACGC 
CGGAGGTGTA AATAAACATT 
AAAAAAAAAA AAAAAAAAAA 



Its 



25 No BLAST result 

Medline entries 



30 

No Medline entry 



35 Peptide information for frame 1 

ORF from 181 bp to SlbT bp=l peptide length: lbt>3 
Category: putative protein 
40 Classification: no clue 

Prosite motifs: RGD (1465-1434) 

1 MKDAAEELSF ARVLLflRVDE LEKLFKDREfl FLELVSRKLS LVPGAEEVTH 

45 51 VTUEELEUAI TDGURASfl AG SETLMGFSKH GGFTSLTSPE GTLSGDSTKfl 

101 PSIEflALDSA SGLGPDRTAS GSGGTAHPSD GVSSREOSKV PSGTGR<2(3<3P 

151 RARDEAGVPR LHflSSTFflFK SDSDRHRSRE KLTSTflPRRN ARPGPVflflDL 

SD1 PLARDfiPSSV PASflSflVHLR PDRRGLEPTG PINflPGLVPAS TYPHGVVPLS 

251 (IGtJLGVPPPE MDDRELIPFV VDEflRPILPPS VPGRDflflGLE LPSTDtfHGLV 

50 301 SVSA YiJHGHT FPGTDflRSME PLGMDflRGCV ISGf1G(2(2GL V PPGIDflcJGLT 

351 LPVVDdHGLV LPFTDtfHGLV SPGLNPISAP <2<2GFV<2PSLE ATGFIflPGTE 

401 <2HDLI(2SGRF flRALVlJRGAY <3PGLV(JPGAD fiRGLVRPGMD OSGLAflPGAD 

451 flRGLVUPGIID tJSGLAflPGRD flHGLIGPGTG (2HDLVt3SGTG (3GVLV(3PGV1> 

501 OPGHVflPGRF ORALVflPGAY OPGLVOPGAD flIDVVflPGA]) (2HGL VflSGAD 

55 551 flSDLAflPGAV flHGLVflPGVD <2RGLA(2PRAD HflRGLVPPG A DflRGLVCPGA 

t.01 DflHGLVtJPGV DCHGLAUPGE V<2RSLV<2PGI VORGLVQPGA VflRGLVflPGA 

bSl VtJRGLVflPGV D<2RGLV<2PGA VflRGLVflPGA VflHGLVflPGA D<3RGLV<2PGV 

701 D(3RGLV(3PGV D(2RGLV(2PGM D(3RGLI<2PG A DQPGLVtJPGA G<2LGMVflPGI 
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10 



15 



20 



751 
flOl 
fl51 
TOl 
TSl 
1001 
1051 
1101 
1151 
1E01 
1551 
1301 
1351 
1MD1 
1M51 
1501 
1551 
IbOl 
lb51 



GQQGr\VQP<2h 
D(3YGLRc2PGA 
EflYGtiVSPLL 
GLGSTHPD(2(2 
GYLSAD(2HG<2 
EGSEVSSEVL 
LDEEflAGGTD 
VERLflRILEG 
DRGKAANENS 
TLAPGfllDPE 
KLC2RC2DEELL 
GLEKLEKEKA 
LVAKflSGflEdJ 
RERPPLYUAD 
PGHHSIRPYT 
KMLMNIEKVfl 
TVPRRCGGSH 
HIYKGRMDTR 
(3S AdISAGNT 



DPHGLVflPGA 
YflPGLIAPGT 
ASflGLASPGI 
HVASPGPGEH 
EGLDPNRTRA 
SERRNSLRRM 
LEKIflFLL A(2 
EGNOEAGKEL 
VSEASL YL(3D 
ATCPACSLDV 
GRVCSAILflV 
NREHLEMEID 
DblQKMLDRLL 
EAAAHRRfJLL 
VFELEflVRflH 
IHFGGSTKAS 
TLTYPYHRSR 
LPGILROSS 
SER 



YPLGLVC3PGA 
KLRGSSTFCJA 
DRRSLVPPET 
DflVYPDAAflH 
SDRHGIPAflK 
SSSFPTAVET 
["lvKRTIPPEL 
KAGELRLOLG 
(3LDKLRMIIE 
SHCIVSTLVRR 
(3GDCEKLNIT 
VKADKSALAT 
TEF1DNKLDRL 
AHFHCLSCDR 
SRNLKLGSAF 
SCIIRELLHA 
PCHLPRGLYP 
GTSKRKSflflP 



YLHDLSflSGT 
DSTGFISVRP 
Y(3(JGLHHPGT 
GHAFSLFDSH 
APGiSDvTLFR 
FHLPIGELSSL 
<2E(3LKTVKTL 
VLRVTVADIE 
SMLTSSSTLL 
YEt3L<3DMVNS 
TSNLIEDHRfl 
KVSRVflFDAT 
ELDPVKflLLE 
PLETPVTGHA 
PRGDLAflllEfl 
(3CLGSPCYKR 
TEEICJIANKH 
RPHVHRPPSL 



PCT/IB01/02050 

YPRGLVdPGfl 
YflHGMVPPGR 
DCJHSPIPLST 
DSMYPGYRGP 
SPDSVDRVLS 
YVGLKESMKD 
AKEVUC2EKAK 
KELAELRESfl 
SMSMAPHKAH 
LAVSRPSKKA 
KtJKDIAMLYfl 
TEfiLNHMIKaE 
DRUKSLRflflL 
IPVTPAGPGL 
SVGRLRSMHS 
VTIN1ADYTYS 
DEVDILGLDG 
SSNGflLPSRP 



BLASTP hits 

25 No BLASTP hits available 

Alert BLASTP hits for »KFZphtes3_lbp3-, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lbp3 ■> frame 1 



30 



35 



40 



45 



50 



55 



Report for DKFZphtes3_lbp3 - 1 



([LENGTHS 1753 
EllliO lfl735i*. e iA 
EpIJ b-11 

EH0I10L3 TREMBL:AF0S54bl_f gene: "J101D1 • 5"n Caenorhabditis 

elegans cosmid MOIDI. le-M? 

EFUNCAT3 30-03 organization of cytoplasm ES- cerevisiaei 
YDL05flw3 fle-07 

EFUNCAT3 Ofl-07 vesicular transport (golgi network-i etc) ES- 
cerevisiaei YDL05flw3 Be-07 

EFUNCAT3 unclassified proteins ES- cerevisiae-i Y0RElbc3 

Se-Qi» 

EFUNCAT3 ll-OI dna repair (direct repair-i base excision repair 
and nucleotide excision repair) ES- cerevisiaei YKRO'lSwI 0-001 
EFUNCAT3 30-10 nuclear organization ES- cerevisiaei YKR0T5w3 
0-001 

EBLOCKSJ PROID^SC 
EBL0CKS3 BP0530fll) 
EBLOCKSJ PR005M3H 
EBL0CKS3 PR00510G 
EBL0CKS3 PR00510E 
EBLOCKSJ BP0ME3bA 
EPIRKU3 RNA binding 3e-0b 
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QIPIRKIO hydroxylysine Ee-ID 

CPIRKIO endoplasmic reticulum ?e-lfl 

(LPIRKIO ATP Se-Db 

EPIRKIO phosphoprotein 3e-Db 

5 ILPIRKIO seed He-3M 

[LPIRKIO saliva Se-ID 

ILPIRKIO glycoprotein Ee-10 

ILPIRKIO heterotrimer 3e-Db 

(LPIRKIO alternative splicing Ee-10 

10 (LPIRKIO P-loop Ee-Db 

(LPIRKIO storage protein Me-3 1 * 

EPIRKIO extracellular matrix Ee-10 

EPIRKIO membrane protein 7e-ia 

(LPIRKIO protein biosynthesis 7e-lfi 

15 CSUPFAI13 myosin motor domain homology Ee-Db 

CSUPFAIin) elastin Be-ID 

ILSUPFAMI glutenin £e-37 

ESUPFAMIB myosin heavy chain Ee-Ob 

(LSUPFAHJ unassigned r ibonucleoprotein repeat-containing proteins 
20 3e-0b 

ESUPFAPU proline-rich protein Ee-1D 

ESUPFAHU ribonucleoprotein repeat homology 3e-0b 

CPR0SITE3 RGD 1 

EKliD All_Alpha 

25 (LKIO L0W_C0MPLEXITY E-64 X 

CKIO C0ILED_C0IL l.flO X 



SE(3 GG(2VEDLSK(3LKRV]>Gi2V(3GIATHV(3HFS(3 ASGLDLAALEWPEEflEVGVRAFDRVRTGSI 

30 SEG 

PRD cccchhhhhhhhhhhhheeeeeeeeeeccccccchhhhhhhhccceeeeeeeeeeecccc 
COILS 



35 SE(3 MKDAAEELSF ARVLLfiRVDELEKLFKDREflFLELVSRKLSLVPGAEEVTIIVTlJEELEflAI 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhh 
COILS 



40 

SE(3 TDGWRAS<3AGSETLMGFSKHGGFTSLTSPEGTLSGDSTK<2PSIEflALDSASGLGPDRTAS 

SEG 

PRD hhhhccccccceeeeeccccccccccccccccccccccchhhhhhhhhhccccccceeec 
COILS 

45 . 

SE<2 GSGGTAHPSDGVSSRE(3SKVPSGTGR(2<2(2PRARDEAGVPRLHt2SSTFt2FKSDSDRHRSRE 

SEG 

PRD cccccccccceeeccccccccccccccchhhhhhhhccchhhhhcccccccccccccccc 
50 COILS 



SE(3 KLTST(2PRRNARPGPVt2t3DLPLARDt2PSSVPASflS(3VHLRPDRRGLEPTGnN(3PGLVPAS 

SEG - - - - xxxxxxxxxxxx 

55 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 
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SEC T YPHG VVPLSt1GQLGVPPPEI"IDDRELIPFVVDE<2RriLPPSVPGRD(2fiGLELPSTDl3HGLV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

5 

SE<2 SVSAYflHGHTFPGTDflRSriEPLGMDflRGCVISGriGflflGLVPPGIDOaGLTLP VVDflHGLV 
SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
10 COILS 

SE(2 LPFTD<2HGLVSPGLI1PISADi2<2GFV<2PSLEATGFI(3PGTE(2HDLI<2SGRF<2RALV(2RGAY 
SEG 

15 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

SEC (JPGLV(3PGAD(3RGLVRPGnD(2SGLAl3PGAD(3RGLVUPGI1DflSGLA(2PGRD(3HGLIt3PGTG 
20 SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

25 SE<2 aHDLVQSGTG(3GVLV£3PGVD(2PGI1V(JPGRF(3RALV(3PGAYi2PGLVt3PGAD(2ID VV(3PGAD 
SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

30 

SEfl <2HGLV<2SGADl2SDLA<2PGAVflHGLV<2PGVD<3RGLA<3PRADHflRGLVPPGADtfRGLV<2PGA 
SEG . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

35 

SE(3 D«2HGLV(JPGVDfiHGLAflPGEV(2RSLVt2PGIVf2RGLVOPGAV<2RGLV<2PG AVflRGLVOPGV 
SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
40 COILS 

SEfl DflRGLV<2PGAVi2RGLV(2PGAV<2HGLVl2PGADi2RGLV<aPGVD(2RGLV(2PGVD{3RGLV(2PGn 
SEG 

45 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

SEC D(2RGLIflPGADflPGLVflPGAGflLGMV(2PGIGfl(2GHV<2POADPHGLV<2PGAYPLGLV(2PGA 
50 SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

55 SE<2 YLHDLS(2SGTYPRGLV(3PGriD<3YGLR(3PGAY(3PGLIAPGTKLRGSSTF(3ADSTGFISVRP 
SEG .- 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
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SE<2 YdHGflVPPGREdYGdVSPLLASdGLASPGIDRRSLVPPETYflflGLMHPGTDflHSPIPLST 
5 SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



10 SE(J GLGSTHPD<2<2HVASPGPGEHD<!VYPDAA<2HGHAFSLFDSHDSnYPGYRGPGYLSAD<2HG(2 
SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



15 

SE<2 EGLDPNRTRASDRHGIPAiSKAPGGDVTLFRSPDSVDRVLSEGSEVSSEVLSERRNSLRRH 

SEG xxxxxxxxxxxxxxxxxxxxxxx • 

PRD ccccccccccccccccccccccccceeeeeccccccccccccchhhhhhhhhhhcccccc 
COILS 

20 

SEtJ SSSFPTAVETFHLMGELSSLYVGLKESI1KDLDEE<2AG(2TDLEKI<2FLLAC!I'1VKRTIPPEL 
SEG • 

PRD cccccceeeeeeeeeccceeehhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcchhh 
25 COILS 



SEfl £JE<3LKTVKTLAKEVWflEKAKVERL<2RILEGEGN(3EAGKELKAGELRL<2LGVLRVTVADIE 
SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCC 

SEfl KELAELRES(3DRGKAAHENSVSEASLYL(3D(3LDKLRriIIESHLTSSSTLLSMSt1APHKAH 

35 SEG • • - • • • • ..i.r^_.-* - • • • • xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhchhhhhcccchhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 
COILS 

CCCCCCCCC 

40 SEC TLAPGflIDPEATCPACSLDVSHflVSTLVRRYE<2L(JD11VNSLAVSRPSKICAKL(2R(3DEELL 
SEG 

PRD hhcccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhh 
COILS 



45 

SEfl GRVflSAIL(3V<2GDCEKLNITTSNLIEDHR<?KflKDIAI1LY<aGLEKLEKEKANREHLEriEID 
SEG 

PRD hhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

50 

SE<3 VKADKSALATKVSRVi2FDATTE<2LNHNM(3ELVAKMSG(2EflDW<2KI1LDRLLTEriDNKLDRL 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhh 
55 COILS 



SEC ELDPVK(2LLEDRUKSLR<2<2LRERPPLY£2ADEAAAI1RR(3LLAHFHCLSCDRPLETPVTGHA 
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SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhcccccccccccccee 
COILS 

5 

SEfl IPVTPAGPGLPGHHSIRPYTVFELEOVRflHSRNLKLGSAFPRGDLAflHEflSVGRLRSNHS 
SEG 

PRD- eeeecccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhh 
COILS 

10 

SEfl KMLMNIEKVfllHFGGSTKASSflllRELLHAflCLGSPCYKRVTDriADYTYSTVPRRCGGSH 
SEG 

PRD hhhhhheeeeeecccccchhhhhhhhhhhhhhcccccceeeccccccceeeccccccccc 
15 COILS 

SEA TLTYPYHRSRPflHLPRGLYPTEEIfllAMKHDEVDILGLDGHIYKGRMDTRLPGILRKDSS 
SEG 

20 PRD ccccccccccccccccccccchhhhhhhhhcceeeeccccceeeecccccccceeecccc 
COILS 

SEfl GTSKRKSflflPRPHVHRPPSLSSNGflLPSRPflSAfllSAGNTSER 

25 SEG 

PRD cccccccccccccccccccccccccccccccceeeeecccccc 
COILS 

30 

Prosite for DKFZphtes3_lbp3 - 1 
PSODDlt 15ie->1545 RGD PDOCDDOlb 

35 

(No Pfaro data available for DKFZphtes3_lbp3 - 1) 
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DKFZphtes3_17i21 



5 group: transmembrane protein 

DKFZphtes3_17i21 encodes a novel 22M amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 2 transmembrane regions* ESTs can be 
found in testis-i retina and brain- 
No informative BLAST results; No predictive prosite-i pfam or SCOP 
motif e • 

15 The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
testicular cells- 



20 unknown protein 

Pedant: contains signal peptide ( f rame 1) and TRANSMEMBRANE 2 

(frame 

2) 

25 perhaps complete cds- 
Sequenced by GBF 
Locus: unknown 

30 

Insert length: 151fl bp 

Poly A stretch at pos. mflD-i polyadenylation signal at pos- 1M5M 



35 1 GCCAGACAGC TAGGTGTCAT TCAGGGCTGG TGTCCTCTGT CCAGGCCATC 

SI ATGGCCTCCA CTGCCGGCTA CATCGTCTCC ACCTCCTGCA AGCACATCAT 
101 TGATGACCAA CACTGGCTGT CCTCTGCCTA CACGCAATTT GCTGTGCCCT 
151 ACTTCATCTA CGACATCTAC GCCATGTTCC TCTGTCACTG 6CACAAGCAC 
201 CAGGTCAAAG GGCATGGAGG GGACGACGGA GCGGCCAGAG CCCCGGGCAG 

40 251 CACGTGGGCC ATAGCGCGTG GCTACCTGCA CAAGGAGTTC CTCATGGTGC 
3D1 TCCACCATGC CGCCATGGTG CTGGTGTGCT TCCCACTCTC AGTGGTGTGG 
351 CGACAGGGTA AGGGAGACTT CTTTCTGGGT TGCATGTTGA TGGCAGAGGT 
401 CAGCACGCCC TTCGTCTGCC TTGGCAAGAT CCTCATCCAG TACAAGCAGC 
1451 AGCACACACT GCTGCACAAG GTGAACGGGG CCCTGATGCT GCTCAGCTTC 

45 501 CTCTGCTGCC GGGTGCTGCT CTTTCCCTAC CTGTACTGGG CCTACGGGCG 

551 CCATGCCGGC CTGCCCCTGC TGGCCGTGCC CCTGGCGATC CCTGCCCACG 
bOl TCAACCTGGG CGCTGCGCTG CTCCTGGCCC CTCAGCTCTA CTGGTTCTTC 
tSl CTCATCTGCC GTGGGGCCTG CCGCCTCTTC TGGCCCCGCT CCCGGCCGCC 
7D1 CCCGGCCTGC CAGGCCCAGG ACTGAGGCCG GGGGCCGGGA CCCTCCCCCT 

50 751 CCCCACCCCC ACCCCCGTGG AGACAGGGCT CTGGGGCTGA TGGCTGGGGT 
601 TGGGAGCCAG GGTCCTCTTG CCCGGACAAC CCCAGGACTG ACGATGACCC 
A51 CGAAAGGGAA GAGGCCCCAT CTCTCGGGGA CTGAGGGGGT GGAGAGAGGG 
=101 GACCTCTTCC CCCTACTCTG CCCCCTTCCT GCACACCCTT GCGCTGGAGG 
■J51 AGGGGAGGGG GCACCGCCTC CCACCCACTG AGGGCAGGAG GGCT1GTGGG 

55 1001 GAGGGACACC AACAGGGTTT CAAGGGGACC AGGAGTCAGA ATGTGGGGAG 
1051 ACGCCTCTGC CAAGGCCATC CCAGCCCCTA TGCTGCCATC CCCCAGGGCT 
1101 CCCCATCACC CGAGAGGAGA GGACGCCCCA ACTAACCCCC GCTGGCCCTC 
1151 GGGCCTCCCG AGTGGCCGGC TGCAACCACG GCTCCTCTCC AGGGTAGGCC 
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12D1 AGCTTGAGGA 
1251 CTGTTGGGTT 
13D1 CTGACAACCC 
1351 CACATCGCGC 
mOl AGGCTTCCTC 
1M51 CGCTA ATAAA 
15D1 AAAAAAAAAA 



ATCTTATTTA 
GGGGGAAGGA 
CGGGGGCAGG 
CCTCGGGCCC 
CAATCTGGGG 
AGACGATCTG 
AAAAAAAA 



TTTT ATTTAT 
GGTGGCTGCT 
CGAGGGCGCC 
TGCCATGTCC 
TCGGGGGACC 
CGTGAACGCC 



TTACCCA AAT 
ACCCCCAAGC 
CAGTCCCTCA 
CTGGTGCTAC 
CTGGGAGGTG 
AAAAAAAAAA 



PCT/IB01/02050 

TTGAACTAGT 
CTTCCCAGTG 
CCATCGGCTG 
TGACCTCTCA 
CTTTACAGAC 
AAAAAAAAAA 



10 BLAST Results 

No BLAST result 

15 

Medline entries 

No Medline entry 

20 

Peptide information for frame 3 

25 

ORF from SI bp to 722 bp=i peptide length: 22M 
Category: putative protein 

Classification: Transmembrane proteins unclassified 

30 1 MASTAGYIVS TSCKHIIDDfi HULSSAYTC2F AVPYFI YDIY AMFLCHUHKH 

51 (2VKGHGGDDG AARAPGSTUIA IARGYLHKEF LflVLHHAAMV LVCFPLSVVli) 
101 RdGKGDFFLG CMLHAEVSTP FVCLGKILItf YKGGHTLLHK VNGALMLLSF 
151 LCCRVLLFPY LYUAYGRHAG LPLLAVPLAI PAHVNLGAAL LLAP<2LYWFF 
201 LICRGACRLF UPRSRPPPAC (3A(3D 

35 

BLASTP hits 

40 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_l?i21-. frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17i21n frame 3 



45 



50 



Report for DKFZphtes3_17i21 - 3 



IELENGTH3 E^^ 
into 2522L4-11 



EpIJ T.03 

55 CH0I10L1I TREMBLNEU: AFlfiltMh_l gene: "BcDNA . GH1232L" ; 

product: "BcDNA • GH1232b" 1 Drosophila melanogaster BcDNA . GHD234D 
(BcDNA. GH0234D) mRNA-, complete cds- Te-20 
[BLOCKS! PR0E3b32H 



-326- 



WO 01/98454 



PCT/IB01/02050 



[BLOCKS} PROO^OMA 

EBL0CKS3) BLD12M3C 

EKIiO TRANSMEMBRANE 2 

EKIiO L0U_C0MPLEXITY t - 25 X 



SE<2 nASTACYIVSTSCKHIIDD(3HWLSSAYT(3FAVPYriYI>IYAnFLCHWHKH(2VK6HGCDD6 

SEG 

PRD cccceeeeeccccceeecchhhhhhhhhhheGGhhhhhhhhhhhhhhhhhhccccccccc 

10 MED 

SE<3 AARAPGSTUAIARGYLHKEPLMVLHHAAnVLVCFPLSVVliJRfiGKGDFFLGCnLIIAEVSTP 

SEG 

PRD ccccccceeeeecccchhhhhhhhhhhhhhhhcccceeeeecccccchhhhhhhhhhccc 

15 mem nnntiMrinnnnnnnnnnn 

SE<3 FVCLGKILI(3YK(3(3HTLLHKVNGALriLLSFLCCRVLLFPYLYUAYGRHAGLPLLAVPLAI 

• • xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhccchhhhhhhhhhhhheeGCceeeeccccccccceeeeccc 

20 men mmmmmmmmmmmmmmmmm 

se<2 pahvnlgaalllap(2lywfflicrgacrlfldprsrpppacl2a<2d 

SEG xx 

PRD cchhhhhhhhhhhccceeeeeecccccccccccccccccccccc 



25 MEM 



(No Prosite data available for DKFZphtes3_17i21 . 3) 
30 (No Pfam data available for DKF2phtes3_17i21-3) 



-327- 



WO 01/98454 
PKFZphtes3_lflnm 



PCT/1B01/02050 



5 group: transcription factors 

DKFZphtes3_ianlM encodes a novel 377 amino acid protein with 
similarity to human giantin- 

10 Giantin is discussed as an autoantigen in rheumatoid arthritis- 
The novel protein contains a leucine zipper and a putative Helix- 
loop-helix DNA-binding domain- Therefore it might be a novel 
transkription factor- Most EST hits are from testis and germ 
cells- 



15 



The new protein can find application in modulation of gene 
expression and in expression profiling. 



20 unknown protein 

see J>KFZphtes3~30i23 
wrong orientation 
perhaps complete cds- 

25 

Sequenced by MediGenomix 

Locus: /chromosome= n lb" 

30 Insert length: 5265 bp 

Poly A stretch at pos- S2M2-. polyadenylation signal at pos- 5257 



1 CC66CACCCG GAGCTCCTGG 

35 51 AGAGT6ATGA CTGATGATGA 

101 AGGGTCGGAG CTGGAGCTGC 
151 AGCTCAGCTA TGTAAACTCT 
2D1 AAATATTACG CTAAACTG6A 
251 AGAAATTAAA ATATCAGCAG 

40 3D1 GATCCAAATC CCGGACAGGT 

351 CAAAAACTTG AGCTGGTACA 
M01 ACGACACACA AGGGCAAATG 
MSI TCATTGAGGA GGCTGAAATT 
501 GAGTTTGAAA AAGATATTCT 

45 551 TTTGGCCACT CAGAAAGTGA 

bQl GGGATAATAT GAAGGAGAAA 
tSl CAGAGGAAAA AAATGCTTTT 
701 6GCCCTTCAC GATGTTGATT 
751 TTCTTGAGAC AATTGAAGCA 

50 flOl TCATCTGGAA ACACTCTGCA 
851 CAAGGCAATG GAAATATACC 
•J01 AAGAGCTACT TGAAAAAATT 
151 CGGGCCAAAG CCGAGGCAGT 
1001 GTTCCGGGCA CCACAGGTGA 

55 1051 CGGACCTGGA GAAGAGCATC 
1101 GAGATGTCCT TAAAAGGCCA 
1151 CAATGAGCAG TTGCAGGCAG 
1201 GCCACGGCTT ACAGACCACT 



GCACACGGCA TTGGCAGGGG CCGCTTCGGC 
GTCCGAGAGC GTCCTCTCCG ACTCCCATGA 
CTGTTATCCA GCTGTGCGGG CTGGTGGAGG 
GCTCTCAAAA CTGAGACTGA GATGTTTGAG 
GCCCAGGGAT CAGCGACCTC CACGATTATC 
CAGATTATGC ACAGTTTCGA GGCAGGCGTA 
ATGGACCGTG GGGTAGGCCT GACT6CCGAC 
AAAAGAGGTT GCGGACATGA AGGATGACTT 
CGGAACGCGA CCTGCAGCAT CACGAGGCGA 
CGATGGAGTG AAGTTTCGAG AGAAGTGCAT 
AAAAGCCATA TCCAAGAAGA AAGGGAGTAT 
TGAAATACAT TGAGGACATG AACCGCCGGA 
TTACGTTTGA AAAATGTTTC TCTCAAAGTT 
ACAATTGAGG CAGAAGGAAG AGGTGAGTGA 
TTCAGCAGTT GAA6ATAGAG AACGCTCAAT 
AGGAATCAAG AACTGACCCA GCTAAAGCTG 
GGTTCTCAAT GCCTACAAAA GCAAGCTTCA 
TCAATCTGGA CAAGGAGATC TTGCTGAGAA 
GAAA AAGAAA CACTACAAGT AGAGGAGGAC 
GAATAAGAGG CTCCGGAAGC AGCTGGCCGA 
TGACTTACGT CCGGGAGAAG ATCTTAAATG 
AGGATGTGGG AAAGGAAAGT GGAGATAGCA 
TCGTAAGGCT TGGAATCGAA TGAAAATAAC 
ATTACCTTGC TGGGAAGTAG CCAGAGGCAG 
ACATGACCTA TAAAAGTAAT CAGCTCCTTT 
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1251 CTAGTCACGG 

13D1 CCCCACCCAG 

1351 TGTTTTCACA 

1401 GCACTCTAAT 

5 1M51 GTGAAAGTGA 

1501 GCCCAGTGTA 

1551 CAAAAACAGA 

IbDl CAGTTTACTC 

lb51 AAAAGAAAAC 

10 1701 AACCTGTTCT 

1751 ATAGTCATAG 

1501 CCTTTACATC 

lflSl AGTCGTGAAT 

nOl ATGGCCCAGT 

15 1^51 GTCCATGGTG 

5001 AGGGGAGGGC 

2051 TTCCACCTTG 

2101 TCAGGACTCC 

2151 ATTGGGCTTC 

20 2201 TAGAAAAACT 

2251 GTTCTAGAAA 

— 2301 GGATACAATT 

2351 ACTTTTCCTT 

2401 TAGCTTCCTG 

25 2451 CAAACCTTCC 

2501 TTCTGTTTTG 

2551 ACCCTTAGTG 

2b01 GCTGGCCGTG 

2b51 ACTTGGGAAT 

30 2701 GAAGCTTGGC 

2751 CTCTCTGAGC 

2fi01 CACAGCACAG 

2fl51 CATCATCCTG 

2=101 AAAATGGGAA 

35 2*151 CCCAGACCCC 

3001 CATGGCCCAT 

3051 CCACTTCTGA 

3101 TGTTGCTCTC 

3151 ACAGGAAACA 

40 3201 AGCAGCCGCT 

3251 TTGGTCACTC 

3301 GCTGGAGTGC 

3351 ATTCAAGCAA 

3401 ATGCGCCACC 

45 3451 CTTGCTCTGT 

3501 CAACCTCCTC 

3551 TAGCTGGGAT 

3b01 AGTAGAGATG 

3b51 CCTCATGATC 

50 3701 GAGCCACTGC 

3751 CACGATGTTG 

3AD1 CTCGCCCTCT 

3fi51 GGTATTCTCT 

3=101 CAGAGATGAA 

55 3=551 AGCGGGGAGG 

4001 GCAGGCCAAG 

4051 CCTCTCCTTG 

4101 AATTGACTGG 



GCTCCTCTCA CTGTTCCCTG 
GCTGAGTATC ATCTCCTGGG 
GCCTGGCCCC TGGAACTGTT 
GGTTTGACAC TTGTTAGCCA 
CCTTCCCACA CCTGGGAGAG 
TGCCATGGGC TTATCCGTGG 
CATCA AAACA GCATGGTGAA 
TTCAGTTTGG TGGGGTAGCT 
AAGCACGAAG GAAACCAAGA 
GACCTGCAAA AATCCTACCT 
TATAAGGGTT GTACAGACGC 
CTTCTCCCTA ACATCTAGAC 
GCGTGATGGT CCTTCTTTGT 
TCAGA ATCAG AATATGTCTT 
GGAGAAAGAA ATCAACTTTT 
CGGCCCTCTC AGCCTTGGAT 
GTGTACAGAA ACAGGCCAGG 
TCTCAGAACC CACAGATCGA 
CTGGGTCCCC CTGTGATAAA 
AGTTTTTGTT GTAA ATCTAT 
TGTTTTACGC TGGTTCTCAC 
TCAAATCTAG GCAGCCACCA 
TTATTATGCA AATTAGCTGT 
GTTCATATTT CATTTTCTTG 
CTCTCTTCTG GCTTCTCATT 
TCCTGAAATG CTCACATTTT 
TAAGCCACTT CCTGCCACCT 
CTCTGGGTCT TCCCTACTCC 
TCTGCCACAT ACACTTTATC 
ATCATTAGCT AGATATGGGA 
CAAGGTGGGA ACACAGTTAA 
TGCCTGGCAC ACAGCA A ACA 
ATGTCGCTAT AAAGGCCAGC 
AAGCAACA AG GCAACTAGTA 
ACACCCCTAG GTCTCCTCTC 
CTTGGTCCGA GAAGGGGGTG 
CCTGTGTGGC CTGCCTGGCT 
GCAGTTTGGA CTGAGACATG 
ATCTGAAC AG ACTGAACCAC 
TCAGCCCCTT ACCATCCGAG 
TCTCTGTCTC TCTTTCTCTC 
AGTGGTGCAA TCTTGGCTCA 
TTCTCCCACC TCAGCCTCTC 
ATGCCCAGCT AATTTTTTTT 
CCCCCATGCT GGAGTGCAGT 
CTCCTGGGTT CAAGCGATTC 
TACAGGCGCC CACCACCACA 
GGGTTTCACC ATGTTGGCCA 
CACCCGCCTC GGCCTCCCCA 
ACCCGGCCTA ATTTCTGTAT 
GCCAGGCTGG TCTTAATCTA 
CAAAGTGCTG GGATTAGGCA 
TTCAATAAAG CTCCTCTTTT 
GACCAGTGGG AAAACATGGG 
CCATGCTGCA AAGCTGCCGT 
GCCAGACATG TGAGGAAGGC 
GATGGAAGGG GGTGCTTTAG 
TGAAGAGGCC CTTGTGTGCA 



TCTGCCTGGT GTTCCCAACC 
CCACATCTGC CCATGGGGAG 
ACCACTGAAA GAACCACAGG 
GCATTTAGTT CACAAGCATA 
GGATAGAGGA GGGAGAGCCA 
CAGCCCCAGT GTGCAACTAT 
TGCCTGGCAC TCAGCATTCT 
CCTGGACTAG ATACTGCTGC 
TGATTTCTTC GGGCTGATAC 
TCCCCCACCT CCCCACCGTA 
CTCAGG AGAC CTGCCTGATT 
TATCTCTAGA GCTGTTTCCT 
CCCTGCAAGT ATGATCCAAC 
CTGTGTCATG GTGGCATTTG 
CCCAGTGGTG GAGTGAGGAC 
GTGATCCATT TGCTGTAGTC 
GCACGTCTCA CCACCGAAGT 
ACTGCTGTAG CTGGCACATC 
AGACAGAAGG CTTCAAGTCT 
CCTTGTGCAA TATACTGTTT 
TGGAAATGGG GCAAATTATA 
CCACAAATTC CAACAAGATG 
GGACTTCTGC TGATTGCCTA 
CCCCTTTCCA GTCCTTTGGC 
CCTGAAATGT TGGTGTTTGT 
CCCTTCTCTG CCTTGCTTCA 
GGCAACTGCT TACCAGCCTG 
CAATGGAGCA GTCCTCTGGG 
TAACTTAAAG TGACGGAGTA 
CCCTGGCAAG TGACCAAATC 
TGCCTGTAAC ACGTGCTGAG 
CTCA ATAGAA TATTAGCTAC 
ATTTTTCTGA AAAGTTGGGG 
GGTATCACTT ACCTTACCTG 
AAAGGAATTC CTGCCCCTCC 
GTCATCCCCA GGCTAGCCAG 
GGAAGGCCCA GGCAATGACA 
GAATGGGGCC GCAATTAACA 
GAGCAGCAGA AAGGCAGAAG 
ACCTGGGTGT GTGGTCTGTC 
TTTCTTTCTC TGTCCCCAAG 
CTGCAACCTC CACCTCTGGG 
GAGTAGCTGG GGCTACAGCT 
TTTTTTTTTT GAGATGGAGT 
GGCATGATCT CGGCTCGCTG 
TCCTACCTCA GCCTCCCCAG 
CCTGGCTAAT TTTTATTTTT 
GGCTGGTCTC GAACTCCTGA 
AGTGTTGGGA TTACAGGCGT 
TTTTAGTAGA GATGGGGTTT 
ACTTCAAGTG ATCTGCCCGC 
TGAACTACCA TGCCCAGTGG 
CCAAGGAAGC CACACCAGAA 
AGCAACTCCG TGGGCAGGCC 
GATTCCCTGG TGATCTCTCA 
CTTGAGGACT TCATTCTGTG 
TGTGGCACTC CTGACTTTTC 
CCTCACTATG TCTGCCTAGG 
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4151 TCATGGGGGC 

4501 CGATTCGCAG 

4251 AATGTGATTT 

4301 CTATGGCGTG 

5 4351 TACGCTTTTT 

4401 ACAAGAGGCA 

4451 CTCATCTCCC 

4501 TGATCTTTTC 

4551 GGCAGGATGG 

10 4bQl AACCCATAGG 

4b51 GTGGCTTTGA 

4701 TCTATGGGGA ' 

4751 AAAATGAAGG 

4fl01 TTTGCAGTTA 

15 4fiSl ATATTCCATT 

4 c iDl GGGTGTGTCT 

4151 TTCTTTCTTG 

5001 TCTTAGATCT 

. 5051 GAATTTTGTT 

20 51D1 TTTAAGAAAT 

5151 AGGCACTCAA 

5201 AATGAGAGGC 

5251 AAAAAAA AAA 



TCCCTGGCCA 
TTTGTCTTAA 
AGGACAAATG 
GTTTTGCAGG 
CTCTTCCCCA 
GAGGCGAAC A 
AAGGGAGCGG 
CCTTGACATT 
GTGAGTGCAG 
CAGATTCTGA 
AGACCTCTGG 
CGSGTCACCA 
CATGTTTAGA 
GATTTTAAAA 
GAGTCTAAGA 
TTTAATCAGT 
GGATTACAAA 
GAGGAAGTAT 
AAAAAGCTAT 
TGTTAAGATC 
TAGATGTTAC 
AGTCCTTCAT 
AAAAAAAAAA 



AGAATGACGT 
CTGTAGTGGT 
ATTGGATGAG 
TCACTGTTCG 
TA ATCCCGTA 
GCTCCAGGTG 
CCACAGCCCA 
CAGCAAAAGC 
AGTGATTCTG 
ACCTGGTGGT 
ACATGAGAAC 
TTAAATGGTG 
GGTGTGTCAC 
GATGGTCAGT 
TACAGTTAGA 
TGATGTCAGA 
AAATGATGGT 
GATACTTGTT 
ATCTTCACTG 
CCCCACCTGG 
ACCAACTTTG 
GTTTTGCAAT 
AAAAAAAAAA 



GGTTCCCCCT 
ATAGCCAGAG 
TGATTGGT AG 
ACCCACCTGG 
GGG6CTGCGA 
CCCCTCTGGA 
GAGTGGGGTC 
CCTGACAGTG 
CTTTTGTTGG 
TGATTCTACA 
ATATTTCCAA 
TGCAAGCATA 
AGTTAAAAAC 
TAGAGTAGAA 
AATCAACATC 
GTTTAACGGG 
GCATTCTATA 
TGACGGA ATG 
TATTTTAACA 
CAGAGGACCC 
GAAGGGCAAA 
AAAATGACTT 
AA 
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TTCATCAGTC 
CAAGAAAAAG 
ATGTCCTCAG 
GCACAGCATA 
CTTCTGAAGC 
GCTACCCTAC 
TTTCATTTTG 
GTAGAATAAA 
GTTTCAGGGA 
TGTGGGAATT 
GACAGAGGAT 
ATTCTGTTCA 
CAACCTGAAC 
ATAGCTTAGA 
TTTGAAATTA 
CAGCATTTTT 
ATTGGCAGCA 
GTTGACGGCA 
CATTATCTAA 
AGTACAAAAT 
CATATTTCTT 
TT-AA A AAAA A 



25 



30 



BLAST Results 



No BLAST result 



Medline entries 



35 No Jledline entry 



40 



Peptide information for frame 3 



45 



ORF from 57 bp to 1167 bpi peptide length: 
Category: putative protein 
Classification: no clue 

Prosite motifs: LE'UCINE_ZIPPER (11-40) 



377 



1 MTDDESESVL 
51 YAKLEPRDflR 
50 101 LELVflKEVAD 
151 EKDILKAISK 
201 KKHLLflLRflK 
251 GNTLflVLNAY 
301 KAEAVNKRLR 
55 351 SLKGHRKAWN 



SDSHEGSELE 
PPRLSEIKIS 
MKDDLRHTRA 
KKGSILATcJK 
EEVSEALHDV 
KSKLHICAMEI 
KfiLAEFRAPfl 
RliKITNE<JL<2 



LPVK3LCGLV 
AADYAflFRGR 
NAERDLCHHE 
VMKYIEDMNR 
DF(3(2LKIENA 
YLNLDKEILL 
VMTYVREKIL 
ADYLAGK 



EELSYVNSAL 
RRSKSRTGMD 
AIIEEAEIRti) 
RRDNMKEKLR 
(2FLETIEARN 
RKELLEKIEK 
NADLEKSIRM 



KTETEMFEKY 
RGVGLTADC2K 
SEVSREVHEF 
LKNVSLKVdJR 
<2ELT<2LKLSS 
ETLOVEEDRA 
UERKVEIAEM 
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BLASTP hits 

No BLASTP hits available 

5 Alert BLASTP hits for DKFZphtes3_lflnm -. frame 3 

No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lflnm -. frame 3 
10 — 

Report for DKFZphtes3_lflnm -3 



15 ELENGTH3 315 

EM Id J Mtl51.ll> 
EpI3 T-l? 

EH0M0L3 TREMBL: AF13b711_l product: "myosin heavy chain"} 

Amoeba proteus myosin heavy chain mRNA-i complete cds- 5e-Db 
20 EFUNCATJ n unclassified proteins ES- cerevisiaei Y0R51bcI 

?e-0M 

EBL0CKS3 BLDDSL3B Stathmin family proteins 
EBLOCKSJ PRDCH15D 
EPR0SITE3 LEUCINE_ZIPPER 1 
25 EPFAM3 Helix-loop-helix DNA-binding domain 

EKbO All_Alpha 

EKbO L0U_C0MPLEXITY b-33 V. 

EKbO C0ILED_C0IL IM-bfl */. 

30 

SEfl GTRSSIdAHGIGRGRFGRVMTDDESESVLSDSHEGSELELPVIfiLCGLVEELSYVNSALKT 

SEG - • 

PRD cccccccccccccceeeeeccccceeeeeccccccceeeeeeeeccchhhhhhhhhhhhh 
COILS 

35 - 

SE<J ETEMFEKYYAKLEPRD<2RPPRLSEIKISAADYAi»?FRGRRRSKSRTGMDRGVGLTADt3KLE 

SEG 

PRI> hhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhccchhhhhccccccccchhhhhhh 
40 COILS 



SEtJ LVOKEVADMKDDLRHTRANAERDLflHHEAIIEEAEIRldSEVSREVHEFEKDILKAISKKK 

SEG xxxxxxxxx 

45 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEfl GSILATt3KVMKYIEDM-NRRRDNMKEKLRLKNVSLKV<3RKKMLL(3LRi3KEEVSEALHDvDF 

50 SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



55 SE(2 <2<3LKIENA<2FLETIEARN(JELTfiLKLSSGNTL(2VLNAYKSKLHKAMElYLNLDKEILLRK 

SEG xxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 



C 



SE<2 ELLEKIEKETLflVEEDRAKAEAVNKRLRKflLAEFRAPfiVriTYVREKILNADLEKSIRIIlJE 

5 SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCC 

10 SEA RKVEIAEt1SLKGHRKAUNRI1KITNE(2L<2ADYLAGK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 
COILS 

15 

Prosite for DKFZphtes3_16nm - 3 
PSDDD2T 3?->S e i LEUCINE_ZIPPER PDOCDQOST 

20 

Pfam for DKFZphtes3_lflnm -3 

25 

HMI1_MAriE Helix-loop-helix DNA-binding domain 

win 

*RRRNHNI1RERRRRndINNli)FeaLRDHIPHhnV. . . PNEKPLSKVEILRM 
30 RRR Nil E+ R++++ + * ++++ +E L V+ 

++ 

fluery llfi RRR-DNf1KEKLRLKNVSLKVQRKKMLL<2L-R(2KEEVSEA- 

LHDVDFflfiL SM3 

35 HI1PI AIEYIrsL<2* 

IE ++L+ 

fluery ^^^ KIENAflFLE S5B 
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5 group: testis derived 

DKFZphtes3_npl2 encodes a novel LtM amino acid protein without 
similarity to known proteins- 

10 No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motif e- 



The new protein can find application in studying the expression 
profile of testis-specif ic genes- 
is 



unknown protein 
Sequenced by MediGenomix 

20 

Locus: unknown 



Insert length: Blbl bp 

Poly A stretch at pos- SOfib-, no polyadenylation signal found 

25 



1 CCCGAGCCAG CAACCCTGAG GGGCGGCCGG GCAGCGCCGC CACCATGTTC 

51 CTGGGCACCG GGGAGCCGGC CTTGGACACG AGTCACCTTA TCTCTCTAAG 

1D1 CCGAGCGTCC CTGACCCCGC AGAAGCTGTG GCTGGGAACC GCAAAGCCAG 

30 151 GAAGTCTGAC CCAGGCCCTG AACTCACCCC TCACCTGGGA GCATGCGTGG 

201 ACTGGCGTCC CCGGCGGCAC TCCTGACTGT CTGACAGACA CCTTCAGAGT 

251 GAAGAGGCCA CATCTCAGGC GCTCTGCCAG CAACGGTCAT GTCCCTGGGA 

301 CTCCTGTCTA CAGAGA AAAA GAAGATATGT ATGACGAGAT TATTGAGTTA 

351 AAGAAGTCAT TGCACGTGCA GAAGAGCGAC GTGGACCTGA TGAGAACGAA 

35 401 GCTCCGGCGC CTGGAGGAGG AAAACAGCAG GAAGGACCGG CAGATAGAGC 

451 AGCTCCTGGA TCCCAGCCGC GGCACGGATT TTGTTCGGAC TCTGGCAGAG 

501 AA AAGGCCCG ATGCCAGTTG GGTCATTAAC GGGCTGAAGC AG AGGATCCT 

551 GAAGCTGGAA CAGCAGTGCA AGGAGAAGGA CGGCACCATC AGCAAACTCC 

fc,01 AGACCGATAT GAAGACTACC AACCTGGAAG AGATGCGGAT CGCCATGGAG 

40 fc>51 ACATACTACG AGGAGGTGCA TCGTCTCCAG ACCCTCTTGG CAAGTTCTGA 

701 AACCACCGGA AAGAAGCCCC TGGGGGAGAA GAAGACGGGC GCCAAAAGGC 

751 AGAAGAAGAT GGGCAGTGCC CTCCTGAGCT TGTCCCGGAG TGTCCAGGAG 

fiOl CTCACGGAAG AGAACCAGAG CCTGAAGGAG GACCTGGACC GCGTGCTGAG 

651 CACCTCCCCA ACCATCTCCA AGACACAGGG TTATGTGGAG TGGAGCAAGC 

45 ^01 CCCGGCTGCT GAGGCGCATT GTGGAGCTGG AGAAGAAACT AAGTGTGATG 

^51 GAGAGCTCAA AATCACACGC CGCAGAGCCA GTCAGATCAC ACCCGCCAGC 

1001 CTGCCTTGCA TCCAGCTCTG CGCTGCACAG ACAGCCACGA GGGGACCGCA 

1051 ACAAGGACCA CGAGCGTCTC CGAGGGGCTG TGAGAGACCT GAAGGAAGAG 

1101 CGGACCGCGC TGCAGGAGCA GCTGCTGCAG AGAGATTTGG AGGTGAAGCA 

50 1151 GCTCCTGCAG GCGAAGGCCG ACCTGGAGAA GGAGCTGGAG TGCGCGAGGG 

1201 AGGGCGAGGA GGAGAGGAGA GAGCGAGAGG AGGTTTTGAG AGAGGAGATT 

1251 CAGACACTTA CCAGCAAGCT CCAAGAATTG CAAGAAATGA AG AAAGAAGA 

1301 GAAAGAGGAT TGCCCGGAAG TTCCTCATAA GGCCCAAGAG CTCCCAGCTC 

1351 CCACTCCCAG CAGCAGGCAC TGCGAGCAAG ACTGGCCGCC GGATTCCAGC 

55 1401 GAGGAGGGGC TCCCGCGGCC CCGCTCCCCC TGCTCTGATG GGAGAAGAGA 

1MS1 CGCCGCGGCC AGAGTCCTGC AGGCCCAGTG GAAGGTGTAC AAGCACAAGA 

1501 AAAAAA AGGC TGTTCTGGAT GAGGCGGCTG TGGTGCTTCA GGCAGCTTTC 

1551 AGGGG ACATC TCACGCGGAC AAAGCTCTTA GCAAGCAA AG CACATGGCTC 
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IbDl AGAGCCACCC AGCGTGCCAG GCCTCCCAGA CCAGAGCTCT CCTGTGCCCC 

lfc>51 GCGTTCCGA6 CCCCATCGCC CAGGCCACGG GCAGCCCTGT GCAGGAGGAG 

17D1 GCCATCGTCA TCATCCAGTC CGCTCTGCGG GCACACCTGG CCCGGGCCAG 

17S1 GCACAGTGCT ACCGGTA AAA GAACCACCAC CGCAGCTTCT ACCAGGAGGA 

5 IflOl GATCGGCTTC AGCCACACAC GGGGACGCCT CCTCCCCACC CTTCCTCGCA 

1651 GCTCTTCCTG ACCCCTCTCC CTCAGGGCCA CAGGCCTTG6 CACCTCTACC 

llOl TGGGGATGAC GTCAACTCCG ATGATTCCGA CGATATTGTC ATTGCACCGT 

1151 CTCTGCCCAC GAAGA ACTTT CCAGTTTAGG TCCCCGTCAC TGTCTCCACG 

EDD1 CCGTGATGGC AGCGCTGCCG AGGACATAGG AACCACGACT GGAAAGATAA 

10 2051 TTTATCGTGT TAGGAGA AGA ACGATGATAC CTACTTAAAA AAAAAAAAAA 

E101 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2151 AAAAAAAAAA A 

15 BLAST Results 
No BLAST result 



20 



25 



Medline entries 



No Medline entry 



Peptide information for frame 3 



30 

ORF from 15 bp to n?b bp; peptide length: bWW 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD C332-33M) 

35 

1 MFLGTGEPAL DTSHLISLSR ASLTP<2KLWL GTAKPGSLT<2 ALNSPLTWEH 
51 AWTGVPGGTP DCLTDTFRVK RPHLRRSASN GHVPGTPVYR EKEDMYDEII 
101 ELKKSLHVQK SDVDLflRTKL RRLEEENSRK DRfllEflLLDP SRGTDFVRTL 
151 AEKRPDASUV INGLKflRILK LE(2<2CKEKDG TISKLflTDIIK TTNLEEMRIA 
201 METYYEEVHR L(2TLLASSET TGKKPLGEKK TGAKRdJKKflG SALLSLSRSV 
251 <2ELTEEN(3SL KEDLDRVLST SPTISKTcJGY VEUSKPRLLR RIVELEKKLS 
301 VI1ESSKSHAA EPVRSHPPAC LASSSALHRfl PRGDRNKDHE RLRGAVRDLK 
351 EERTAL«2E<3L LflRDLEVKOL LflAKADLEKE LECAREGEEE RREREEVLRE 
mil EIUTLTSKLfl ELflENKKEEK EDCPEVPHKA (2ELPAPTPSS RHCEflDWPPD 
i»51 SSEEGLPRPR SPCSDGRRDA AARVL(2A<2lilK VYKHKKKKAV LDEAAVVLOA 
5D1 AFRGHLTRTK LLASKAHGSE PPSVPGLPDfl SSPVPRVPSP IAdATGSPVU 
551 EEAIVIICJSA LRAHLARARH SATGKRTTTA ASTRRRSASA THGDASSPPF 
bOl LAALPDPSPS GPfiALAPLPG DDVNSDDSDD IVIAPSLPTK NFPV 



BLASTP hits 
55 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_npl2-, frame 3 



40 



45 



50 
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Pedant information for DKPZphtes3_nplE-i frame 3 
Report for DKFZphtes3_nplE . 3 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



ELENGTH3 
EMIiO 
EpIJ 
DLH0I10L3 
n KIAA1023 
partial c 
EFUNCAT3 
YDLDSflwIB 
EFUNCAT3 
cerevisia 
EFUNCATJ 
3e-0b 
EFUNCATJ 
YDR3SbwJ 
EFUNCATH 
Ee-05 
EFUNCATJ 
YDR35bwJ 
EFUNCATJ 
YJR13HC3 
EBL0CKS3 
EBL0CKS3 
EBLOCKSJ 
EBL0CKS3 
EBL0CKS3 
EBLOCKSJ 

EBLOCKSID 

EEC! 

EPIRKliO 

EPIRKIO 
: EPIRKliO 

EPIRKIO 

EPIRKLO 

EPIRKliO 

EPIRKliO 

EPIRKIO 

EPIRKIO 

EPIRKbO 

EPIRKbO 

EPIRKliO 

EPIRKliO 

ESUPFAI1J 

ESUPFAIiJ 

ESUPFAPO 

ESUPFAIIJ 

ESUPFAPO 

ESUPFAHJ 

EPROSITEJ 

EKLO 

EKUO 



bMM 

7iaio.m 

fi.fiO 

TREMBL: AB0Ea^Hb_l gene: n KIAA10E3"=, product: 
protein"; Homo sapiens mRNA for KIAA10E3 protein-i 
ds- D-D 

30-03 organization of cytoplasm ES. cerevisiae-. 
Ee-07 

Oa-O? vesicular transport (golgi network-, etc.) ES. 
e-. YDLOSflwJ Ee-Q? 

unclassified proteins ES- cerevisiae-i YLR3DTc3 



ES. cerevisiae-. 



30. OH organization of cytoskeleton 
£e-0S 

0^.10 nuclear biogenesis ES - cerevisiae-. YDR3Sbwl 



ES. cerevisiae-. 
ES. cerevisiae-. 



03. EE cell cycle control and mitosis 
Ee-OS 

^fl classification not yet clear-cut 
He-OS 

Dnoissm 

BLD0b27B GHMP kinases ATP-binding domain proteins 
BLD032bC Tropomyosins proteins 
BLOllbOB Kinesin light chain repeat proteins 
BLOOaEOD Glucoamylase proteins region proteins 
BP0Hm?C 

BLOOmEB Neuromodulin (GAP-H3) proteins 
3-fci.l". 32 Myosin ATPase 3e-06 
tandem repeat 3e-0a 
transmembrane protein Ee-07 
muscle contraction 3e-oa 
actin binding 3e-0a 
ATP 3e-0a 

thick filament 3e-0a 
alternative splicing 7e-07 
coiled coil 3e-0a 
P-loop 3e-0a 
heptad repeat Ee-07 
methylated amino acid 3e-0a 
hydrolase 3e-0B 
Golgi apparatus Ee-07 
myosin heavy chain 3e-0fi 
myosin motor domain homology 3e-0a 
alpha-actinin actin-binding domain homology ae-0b 
plectin ae-0b 

ribosomal protein SID homology ae-Db 
giantin Ee-07 
RGD 1 
All_Alpha 

LOU^COMPLEXITY IM-bO X 
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EKIO C0ILED_C0IL 1S-5S '/. 



SE(2 IIFLGTGEPALDTSHLISLSRASLTPflKLliJLGTAKPGSLTflALNSPLTWEHAlilTGVPGGTP 

5 SEG 

PRD cccccccccccceeeeeeeecccccceeeeecccccceeeeccccccccccccccccccc 
COILS 



10 SE<2 DCLTDTFRVKRPHLRRSASNGHVPGTPVYREKEDMYDEIIELKKSLHVflKSDVDLflRTKL 

SEG 

PRD cccccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCC 

15 

SE<3 RRLEEENSRKDRt2IECLLDPSRGTDFVRTLAEKRPDASWVINGLK£2RILKLEl2(3CKEKDG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

20 CCCCCC 

SEfl TISKL(2TDI1KTTNLEEriRIAriET YYEEVHRLOTLLASSETTGKKPLGEKKTGAKRflKKMG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
25 COILS 

• cccc 

SEfl SALLSLSRSVdJELTEENflSLKEDLDRVLSTSPTISKTdJGYVEUSKPRLLRRIVELEKKLS 

SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC , 

SEfl VPlESSKSHAAEPVRSHPPACLASSSALHRfiPRGDRNKDHERLRGAVRDLKEERTALfJEflL 

35 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



40 SE<2 LC2RDLEVK<3LL<2AKADLEKELECAREGEEERREREEVLREEI<2TLTSKL<aEL<2EI1KKEEK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

45 

SE<2 EDCPEVPHKAflELPAPTPSSRHCE(2DliJPPDSSEEGLPRPRSPCSDGRRDAAARVL<2A<3li)K 

SEG x x 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

50 CCCCCC 

SE(2 VYKHKKKKAVLDEAAVVL(2AAFRGHLTRTKLLASKAHGSEPPSVPGLPD(2SSPVPRVPSP 
SEG xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhcccccccccccccccccccccccc 
55 COILS 



SEt3 IAOATGSPVflEEAIVIIflSALRAHLARARHSATGKRTTTAASTRRRSASATHGDASSPPF 
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SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccceeeehhhhhhhhhhhhhhhhccccceeehhhhhhhhhccccccccccce 



COILS 

5 



SE(2 LAALPDPSPSGPflALAPLPGDDVNSDDSDDIVIAPSLPTKNFPV 
SEG xxxxxxxxxxxxx..., 

PRD eeecccccccccccccccccccccccccceeeeecccccccccc 
COILS , 

10 



Prosite for DKFZphtesB^nplS . 3 
15 PSDDDlfc, 335->335 RGD PDOCDDDlb 

(No Pfam data available for DKFZphtes3_npl2 . 3) 
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5 group: transmembrane protein 

DKFZphtes3_EDhlE encodes a novel 1204 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 1 transmembrane region and two leucine 
zippers- 
No informative BLAST results^ No predictive prosite-. pfam or SCOP 
motife- 



15 The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
testicular cells- 



20 putative protein 

perhaps complete cds- 
Pedant: TRANSMEMBRANE 1 

25 Sequenced by MediGenomix 

Locus: unknown 

Insert length: 5flT4 bp 
30 Poly A stretch at pos. 5fl74n no polyadeny lat ion signal found 



1 CTCTGCCTTT CCTCTCGCAG CCACCCTTCC TCTCAGACCA GTACGGTGGC 
SI CGACGGGAGT CAGACGCTGG GGATGAATGA AGGATCAACA AACAGTAATA 
35 101 ATGACTGAAT GTACAAGTCT TCAGTTTGTC AGCCCTTTTG CTTTTGAGGC 

151 AATGCAGAAG GTGGATGTTG TTTGCCTGGC ATCTTTAAGT GATCCAGAAT 
E01 TAAGACTTCT TCTGCCCTGT TTGGTACGGA TGGCACTTTG TGCACCTGCT 
ESI GACCAGAGCC AAAGCTGGGC TCAGGATAAG AAACTCATCC TTCGCCTTCT 
3D1 TTCTGGAGTG GAAGCTGTCA ACTCCATTGT TGCATTGTTG TCCGTGGACT 
40 351 TTCATGCTTT AGAACAAG AT GCCAGCAAAG AACAGCAGCT TAGGCATAAA 

401 CTTGGAGGAG GCAGTGGAGA GAGCATCCTG GTATCACAGC TTCAGCATGG 
451 ACTGACGTTA GAGTTTGA AC ACAGTGATTC ACCTCGTCGA TTGCGTCTTG 
SOI TGCTTAGTGA ACTGTTGGCA ATTATGAACA AGGTGTCTGA GTCCAACGGA 
551 GAATTTTTTT TCAAGTCTTC TGAACTTTTT GAGAGTCCAG TATATTTGGA 
45 tOl GGAAGCTGCA GATGTACTTT GTATTTTACA AGCAGAGCTC CCTTCCTTGC 

b51 TCCCTATAGT TGATGTAGCT GAAGCTTTGC TACATGTTAG AAATGGTGCC 
701 TGGTTCTTGT GTCTCTTGGT GGCCAATGTT CCTGATAGTT TTAATGAAGT 
751 TTGTAGGGGC CTGATAAAAA ATGGAGAACG ACAAGATGAA GAAAGTCTTG 
fiOl GAGGAAGGCG CAGGACAGAT GCCTTACGCT TCTTGTGTAA AATGAATCCT 
50 fiSl TCTCAGGCCC TCAAGGTCCG AGGCATGGTG GTGGAAGAAT GTCACTTGCC 

101 AGGCCTTGGT GTGGCTTTGA CATTGGATCA TACTAAAAAT GAAGCTTGTG 
151 AGGATGGAGT GAGTGACTTG GTTTGTTTTG TAAGTGGTTT GCTTCTTGGA 
1001 ACAAATGCGA AAGTCCGGAC TTGGTTTGGA ACTTTTATCC GAAATGGACA 
1051 GCAGAGAAAA AGAGAGACCA GCAGTTCTGT CCTTTGGCAG ATGAGAAGGC 
55 1101 AGCTTCTTCT GGAGTTGATG GGCATTCTTC CCACAGTAAG AAGCACCCGA 
1151 ATTGTGGAAG AAGCTGATGT GGATATGGAG CCCAATGTGT CTGTGTATTC 
1E01 GGGGCTGAAA GAAGAGCATG TTGTGAAAGC CAGTGCACTC TTACGTCTGT 
1SS1 ACTGTGCTTT GATGGGGATC GCTGGACTCA AACCAACTGA AGAAGA AGCT 
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1301 GAGCAATTAC TGCAGTTGAT GACGAGCCGT CCTCCTGCTA CGCCAGCTGG 

1351 GGTTCGCTTT GTTTCACTTT CCTTTTGTAT GCT ACTGGCC TTTTCTACAC 

mDl TTGTCAGTAC ACCTGAACAG GAGCAGCTGA TGGTGGTGTG GCTA AGTTGG 

1451 ATGATAAA AG AAGAAGCGTA TTTTGAGAGT ACTTCAGGCG TCTCTGCTTC 

5 1501 TTTTGGGGAG ATGTTATTAT TGGTGGCTAT GTACTTTCAC AGCAACCAGC 

1551 TTAGTGCTAT CATTGACTTG GTCTGTTCC A CTTTGGGGAT GAAGATTGTA 

IbOl ATTAAGCCAA GCTCCTTGAG CAGGATGAAG ACAATCTTCA CACAGGAAAT 

lb51 TTTTACTGAG CAGGTTGTCA CAGCTCATGC AGTTCGGGTC CCTGTCACCA 

17D1 GCAACCTGAG TGCCAACATT ACTGGATTTT TGGCTATTCA TTGTATTTAC 

10 1751 CAGCTTCTCA GGAGCCGTTC CTTTACCAAG CACAAAGTGT CAATAAAAGA 

1&01 TTGGATTTAT AGACAGCTGT GTGAAACCTC TACTCCACTT CATCCTCAAT 

IfiSl TACTTCCTTT GATTGATGTG TACATAAATT CTATACTTAC TCCTGCGTCG 

1S01 AAATCTAATC CAGAAGCCAC AAATCAGCCA GTCACAGAAC AGGAGATACT 

1151 CAATATTTTC CAAGGAGTCA TTGGGGGTGA CAACATCCGC CTTAATCAGC 

15 2001 GTTTCAGTAT CACAGCACAG CTTTTGGTGC TCTACTATAT ACTGTCTTAT 

2051 GAAGAGGCTC TTCTAGCAAA CACGAAGACT TTAGCTGCCA TGCAAAGAAA 

2101 GCCCAAATCA TATTCTTCTT CTTTA ATGGA TCAGATTCCT ATCAAATTCC 

2151 TTATTCGACA GGCTCAAGGG CTGCAGCAGG AGTTGGGAGG GTTGCATTCA 

2201 GCTTTACTAC GTCTCCTTGC TACTAACTAC CCACATTTAT GTATTGTGGA 

20 2251 TGACTGGATT TGTGAAGAAG AAATCACAGG GACTGATGCC CTGCTACGGC 

23D1 GAATGCTCCT GACTAATAAT GCTAAA AATC ATTCTCCCAA ACAACTCCAA 

2351 GAAGCATTTT CAGCTGTCCC AGTAAATCAC ACACAAGTGA TGCAGATTAT 

2401 AGAACACTTG ACTCTACTCT CTGCCAGTGA ACTTATACCA TATGCGGAAG 

2451 TGTTAACATC CAATATGAGC CAGCTATTGA ATTCAGGGGT TCCACGGAGA 

25 2501 ATTCTGCAAA CAGTCAATAA ACTATGGATG GTTCTTAATA CTGTGATGCC 

2551 TAGAAGGCTA TGGGT AATGA CGGTTAATGC ACTTCAGCCT TCAATAAAGT 

2b01 TTGTACGACA ACAAAAGTAT ACTCAGAATG ACCTGATGAT AGATCCTCTC 

2b51 ATTGTCCTAA GGTGTGATCA GAGGGTTCAC AGATGCCCCC CACTGATGGA 

2701 TATTACCCTA CACATGTTGA ATGGATATCT TCTTGCATCT AAAGCCTACC 

30 2751 TTAGTGCTCA TCTGAAGGAA ACAGAGCAAG ATAGGCCTTC CCAGAATAAT 

2BD1 ACAATTGGTT TAGTTGGACA AACTGATGCT CCGGAAGTTA CCAGGGAAGA 

2651 ATTGAAAAAT GCATTACTGG CCGCTCAGGA TAGTGCAGCT GTCCAGATTC 

2=!D1 TCTTAGAGAT TTGCCTACCT ACTGAAGAGG AGAAAGCAAA TGGTGTCAAT 

2^51 CCAGATAGCT TGTTAAGAAA TGTTCAAAGT GTTATTACCA CCAGCGCTCC 

35 3DD1 AAATAAGGGA ATGGAGGAAG GAGAAGACAA TTTGCTCTGT AACCTTCGAG 

3D51 A AGTTCAGTG CCTTATCTGT TGTCTCTTGC ACCAAATGTA CATTGCAGAT 

3101 CCCAACATTG CTAAGCTTGT TCACTTTCAG GGTTATCCAT GTGAACTTTT 

3151 GCCTCTGACG GTCGCAGGTA TTCCATCTAT GCACATCTGT CTAGATTTCA 

32D1 TACCTGAGCT TATTGCACAG CCAGAACTTG AGAAACAGAT ATTTGCTATC 

40 3251 CAGTTGCTTT CTCACTTGTG TATACAATAT GCATTACCAA AGTCACTT AG 

33D1 TGTGGCTCGT TTAGCTGTCA ATGTCATGGG AACTTTGTTA ACAGTTTTAA 

3351 CACAGGCTAA GCGGTATGCT TTTTTTATGC CAACTCTGCC AAGTTTGGTC 

3401 TCTTTTTGTC GAGCATTTCC TCCATTGTAT GAGGATATTA TGTCTTTGCT 

3451 GATCCAA ATA GGGCAAGTTT GTGCCTCTGA TGTTGCCACT CAGACAAGAG 

45 35D1 ACATTGATCC AATTATTACA CGTCTTCAAC A AATAAAGGA GAAACCAAGT 

3551 GGATGGTCTC AAATCTGTAA AGATTCATCT TATAAAAATG GATCCAGGGA 

3L01 CACTGGA AGC ATGGATCCTG ATGTACAGCT CTGTCACTGT ATTGAAAGAA 

3fe.Sl CAGTAATTGA AATAATAAAT ATGAGTGTTA GTGGAATTTA AAACAAAATT 

3701 TAAAACAACA AAAAGTTGTT TGCTGCATAT ACCCAACATG AATCTGCATA 

50 3751 TTAGTAACAA CTCTAAACTG AATGGGAACA GTAAAGTATT GTCTTGGAAT 

3fi01 CACTAAAACA ATTCAATTCA ACATGAGTAT AGTTTAGAAC TTTATGAGAA 

3651 TTATGCTTGC TTGTTTCTGA TTGGCACATC TTTGGATCTA CTTTGCTGAT 

3^01 ATGTTTCTAT TGTAGCAGCT GAGCTTTTTT TTTTTCCACT GGGAACACAT 

3^51 GTAAGAAACT CATTATTGGA AAGGGAATTT GGCCTTGTAT TTAGCTTTTG 

55 4001 AAGTGA AGAC TGCCATGCCT TTAATTTCTT ATAAAAATGA GTCTGTGGGT 

4051 AGCCCTAGTG TTTATTTTAA CTGTGAGCTT GTAACAGAAT GTGACAAAGA 

4101 TGCAAAGATG GGAGAGGAAA AAAGGGTAAA GGGAAAGGAG AATTAAGGAA 

4151 ATAATAGGAG TTAAAAACAC AAGTAGAAAT CTCAAAGATT TGCAGTGCAA 
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L4ED1 GTAATAGTAA TGCAAGTTGG 
M251 AGACTTTTAA AAAGGCAAGT 
M301 AGATGAGGAT TTAAAGATTT 
M351 TGAAGCCATA TGTTTAAAGA 
5 MMOl CTGGTGTAAA AGTGTTTACA 
HM51 AGATCTCATT TAGTTTTATG 
MSD1 ACTTACCTAA TGCTCAGAGG 
H551 AGAGCAATAA AACCCACTGG 
L4b01 TATAATTCAT ATCTTACTTT 

10 4fc>51 TTTGAACCTG AGGAAAAGTT 
M701 ATTTTGCAGT CCTCTGTCAG 
H751 GGTAAACCCT AATCATTAAA 
TTTTTTCTCT CTGAGATATA 
Mfl51 GAAACAGACC TCTTCACTTG 

.15 M^Ol TCTTCCCCTG TTGCCCAGGC 
GCAACTTCTG CCACCTGGGT 
5001 GTAGCTGGGA CTGCAGGCAC 
5051 AGACGGGGGT TTGCCATGTT 
5101 GGTGATCTGC CCACCTTGGC 

20 5151 CCACCGCACC TGGCCAGACC 
5201 AGAAGGTGTA GTTTTTGAGA 
5251 TAAATAGTCA CATCTCATTT 
5301 CCCTACCTCA TATTCTATGA 
5351 AGTGCATAGT ATCACATGTG 

25 .'• 5M01 TTAATGTTCA ATTAAGTAAT 
5M51 ATCTTAAAAA TAGCATAAGA 
5501 TAGTCCCTTA AGATTAAATA 
5551 CTTGCGATAT TTTGTGTGAA 
5b01 AAAAGTATGT TTTCTAATTA 

30 5b51 AAAGACTGGT CATAACCTGC 
5701 CATCAGATTT GTTGATGATG 
5751 GTTAAT ATTC ATGTATTTTA 
5flDl TTTTTTTATT CAAGTGAAAA 
5651 TTATAAACAT TCATATTCTT 

35 



AATTCTAGTT CTCAAGAAAG AGTATTGAGA 

AGCTTTTGTA AATGATTTCT GTGGAAATAC 

CACATATTTG CTTCAATTTT TATTAATATA 

GATACTTGAA TAATTTGGAA TTTT AAGAT A 

GAAACATCTT TGTTCAAAGA AG AACCTGAG 

TTTTAAATTT ATTTTTATAA TGCTTTATTA 

GGGGAAATAT GTATCAAATT AAATGAAGGT 

ATTAAAGAGC TCTTGGTTTG TC ATCAGG AT 

GAGAAGATCT TTGAGTAAGA A A ATGCAGTG 

AAAGTGTAGA AAATATTGTC TTGCCGAAGG 

TAACTTCCAT TGATTAGGCA GACATATTCA 

AAAAAATTAT CAATGTAGAA AGTAATTCCC 

CCTCAATCAC ACACTTCCCC ACCCCCACTT 

TGTTTTTTTT TTTTTTTTCC TGAGGTGGAG 

TGGAGTGCAG TGGGATGATC TTGGCTCACT 

TCAAGGGATT CTCGTGCCTC AACCTCCTGA 

GCGCCACCTG TATTTTTGT A TTTTTAGT AG 

GCCCAGACTG GTTTTGAACT CCTGGCCTCA 

CTCCCA AAGT GGTGGGATTA CAGGTGTGAG 

GCTTCACTTG TAAAAGAAAT TAGGCTAATA 

AATGAAATTT AACTTTAGCC TTTTCACTAG 

TCTTCCTTXG. TAAAATGGGG TTACTACTGG 

GAATGAGTTT GTAGGTGTTT CAAATCATGA 

ATAGAATATT TATAACTTTT TATTAGATGC 

TTTGATGTGA AAAATAAAAG TAATAAAAGT 

ATTTTCATAT TTTTAAACAA GGCAGTTTTG 

CAACTGCTCC TTTTTTTTTT AAACTGAGGC 

TAGATATGCC CTAGGAGTTC AGAAAAAGTT 

AATGCAGTGC ACATTCCTGG ATCAATATTC 

TGTGTTAAAA TAATCACATA TGCTCTTTTT 

TAAATAAAAT GTGTAAATAT ATTAGTAAAT 

AGTTAAGGTT ATAAAATTTG TCACAATGTG 

CAGATGTGTG CAGCTATTTT GAATATTGGT 

TATCAAACAA AAAAAAAAAA AAAA 



BLAST Results 



40 No BLAST result 

» 

Medline entries 



45 

No Medline entry 



50 Peptide information for frame 2 



ORF from 77 bp to 3tfifl bp^ peptide length: 120M 
Category: putative protein 
55 Classification: unclassified 

Prosite motifs: LEUCINE_ZIPPER (lb7-lflM) 
LEUCINE_ZIPPER (b c !2-70 c l) 
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WO 01/98454 PCT/IB01/02050 



1 MKDCCTVIMT ECTSLtfFVSP FAFEAM<2KV]> VVCLASLSDP ELRLLLPCLV 

51 RMALCAPADfl SflSWACJDKKL ILRLLSGVEA VNSIVALLSV DFHALECJDAS 

1D1 KE<2(3LRHKLG GGSGESILVS flLGHGLTLEF EHSDSPRRLR LVLSELLAIM 

5 151 NKVSESNGEF FFKSSELFES PVYLEEAADV LCILfiAELPS LLPIVDVAEA 

SOI LLHVRNGAUF LCLLVANVPD SFNEVCRGLI KNGERUDEES LGGRRRTDAL 

251 RFLCKMNPSfl ALKVRGMVVE ECHLPGLGVA LTLDHTKNEA CEDGVSDLVC 

301 FVSGLLLGTN AKVRTWFGTF IRNGflflRKRE TSSSVLldflMR ROLLLELMGI 

351 LPTVRSTRIV EEAD VDMEPN VSVYSGLKEE HVVKASALLR LYCALMGIAG 

10 M01 LKPTEEEAE<2 LLflLMTSRPP ATPAGVRFVS LSFCMLLAFS TLVSTPECJEG 

MSI LMVVWLSldMI KEEAYFESTS GVSASFGEML LLVAMYFHSN (2LSAIIDLVC 

501 STLGMKIVIK PSSLSRMKTI FTflEIFTEdV VTAHAVRVPV TSNLSANITG 

551 FLPIHCIYflL LRSRSFTKHK VSIKDUIYRfi LCETSTPLHP CLLPLIDVYI 

bOl NSILTPASKS NPEATNflPVT E£2EILNIF(3G VIGGDNIRLN ORFSITAtfLL 

15 b51 VLYYILSYEE ALLANTKTLA AMfJRKPKSYS SSLMD6IIPIK FLIROAflGLfl 

701 (2ELGGLHSAL LRLLATNYPH LCIVDDUICE EEITGTDALL RRMLLTNNAK 

751 NHSPKCLtfEA FSAVPVNHTd VMOIIEHLTL LSASELIPYA EVLTSNMSCL 

flOl LNSGVPRRIL flTVNKLWMVL NTVUPRRLUV MTVNALOPSI KFVR<2(JKYT0 

BS1 NDLMIDPLIV LRCDORVHRC PPLMDITLHM LNGYLLASKA YLSAHLKETE 

20 101 <3DRPS(3NNTI GLVGflTDAPE VTREELKNAL LAAflDSAAVtf ILLEICLPTE 

151 EEKANGVNPD SLLRNVCSVI TTSAPNKGME EGEDNLLCNL REVdCLICCL 

• 1001 LHfltlYIADPN IAKLVHFGGY PCELLPLTVA GIPSMHICLD FlPELIAflPE 

1051 LEKUIFAIflL LSHLCIflYAL PKSLSVARLA VNVMGTLLTV LTflAKRYAFF 

11D1 MPTLPSLVSF CRAFPPLYED IMSLLI<2IG<2 VCASDVATtfT RDIDPIITRL 

25 • 1151 UtJIKEKPSGIil SfllCKDSSYK NGSRDTGSMD PDVOLCHCIE RTVIEIINI1S 

1201 VSGI 



30 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20hl2 -. frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2Dhl2-. frame 2 



35 



40 



Report for DKFZphtes3_20hl2 . 2 



ELENGTH3 12DM 
45 IMW3 13M3M7-53 
EpO 5-75 

EHOMOLJ TREMBL:CEZC3?b_3 gene: n ZC3?b.b"; Caenorhabditis 

elegans cosmid ZC37b 2e-22 
EPR0SITE3 LEUCINE_ZIPPER 2 
50 EKbO TRANSMEMBRANE 1 

EKUJ L0U_C0MPLEXITY 2-57 '/. 

EKU3 C0ILED_C0IL 2-33 V. 

55 SE<3 MKDcJflTVIMTECTSLfiFVSPFAFEAMflKVDVVCLASLSDPELRLLLPCLVRMALCAPADO 

SEG 

PRD cccceeeeeeeccceeecchhhhhhhhheeeeeeecccchhhhhhhchhhhhhhhccccc 
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WO 01/98454 
COILS 



PCT/IB01/02050 



MEM 

5 SEfl S<2SUAt?DKKLILRLLSGVEAVNSIVALLSVDFHALE<3DASKE(2<2LRHKLGGGSGESILVS 
SEG 

PRD hhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhhcccccceeeec 
COILS 



10 MEM 

SEtJ (SLdHGLTLEFEHSDSPRRLRLVLSELLAIMNKVSESNGEFFFKSSELFESPVYLEEAADV 

SEG xxxxxxxxxxxx 

PRD ccccccceeeecccccchhhhhhhhhhhhhhhhhhcccccccccccccccchhhhhhhhh 
15 COILS 



MEM 

SEG LCILflAELPSLLPIVDVAEALLHVRNGAWFLCLLVANVPDSFNEVCRGLIKNGERflDEES 
20 SEG 

PRD hhhhhhcccccchhhhhhhhhhhhhccchhhhhheeeccccccchhhhhccccccccccc 
COILS 



MEM 

25 

SEO LGGRRRTDALRFLCKMNPSflALKVRGMVVEECHLPGLGVALTLDHTKNEACEDGVSDLVC 

SEG 

PRD ccccchhhhhhhhhhcccceeeeeeeeeeeeeccccccceeeecccccccccccccceee 
COILS 

30 

mem : 

SE<3 FVSGLLLGTNAKVRTUFGTFIRNG(2<2RKRETSSSVLU(2MRR<2LLLELMGILPTVRSTRIV 
SEG 

35 PRD GeecccccccgeeeeeGeeeeecchhhhhcccchhhhhhhhhhhhhhhhccccceeeeee 
COILS 



MEM 

40 SEfl EEADVDMEPNVSVYSGLKEEHVVKASALLRLYCALMGIAGLKPTEEEAE(3LLi2LMTSRPP 
SEG 

PRD eeeccccccceeeeccccchhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccc 
COILS 



45 MEM 

SEd ATPAGVRFVSLSFCMLLAFSTLVSTPE(2E(3LM VVULSUMIKEEAYFESTSGVSASFGEML 
SEG 

PRD cccceeeeeehhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhcccccccchhhhhh 
50 COILS 



MEM MMMMMMMMMMMMMMMMM 

SE(3 LLVAMYFHSN(3LSAIIDLVCSTLGMKIVIKPSSLSRMKTIFT(3EIFTE(2 VVTAHAVRVPV 
55 SEG 

PRD hhhhhhhccchhhhhhhhhhhhccceeeeeccccchhhhhhhhhhhhhhhhhhhhheeec 
COILS 



-342- 



WO 01/98454 PCT/1B01/02050 
MEM 

SEfl TSNLSANITGFLPlHCIY(2LLRSRSFTKHKVSIKDblIYR<2LCETSTPLHP(2LLPLIDVYI 

SEG 

5 PRD cccccceegeeehhhhhhhhhhhhhcccccccchhhhhhhhhcccccccccccccceeee 
COILS 



MEM 

10 SEfi NSILTPASKSNPEATN(2PVTE<3EILNIF(2GVIGGDNIRLNi2RFSITA<3LLVLYYILSYEE 

SEG 

PRD eeccccccccccccccccchhhhhhhhhhhccccccceeeghhhhhhhhhhhhhhhhhhh 
COILS 



15 MEM , 

SEfi ALLANTKTLAAMflRKPKSYSSSLMDfllPIKFLIROAflGLUflELGGLHSALLRLLATNYPH 

SEG xxxxxxxxxxxxxx'xxxxx 

PRD hhhhhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhhhcccchhhhhhhhhcccc 
20 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEC LCIVDDUICEEEITGTDALLRRMLLTNNAKNHSPK<2L<2EAFSAVPVNHT(2VMl3IIEHLTL 

25 SEG 

PRD eeeecceeeeechhhhhhhhhhhhhhccccccccchhhhhhhhhhcccchhhhhhhhhhh 
COILS 

MEM 

30 

SE<3 LSASELIPYAEVLTSNMSfiLLNSGVPRRILflTVNKLUMVLNTVMPRRLUVMTVNALCPSI 

SEG 

PRD hhhhhhhhhhcccccchhhhhccccchhhhhhhhhhhhhhhhccccchhhhhhcccccch 
COILS 

35 

MEM : 

SE<3 KFVRflrJKYTCJNDLMIDPLIVLRCDtJRVHRCPPLMDITLHMLNGYLL ASKAYLSAHLKETE 

SEG 

40 PRD hhhhhhhcccccccccceeeeecccccccccccceeeccccccchhhhhhhhhhhhhhhh 
COILS 



MEM 

45 SEd flDRPS(3NNTIGLVG(JTDAPEVTREELKNALLAA(3DSAAV(2ILLEICLPTEEEKANGVNPD 
SEG 

PRD ccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 
COILS 



50 MEM . 

SE<2 SLLRNVi3SVITTSAPNKGMEEGEDNLLCNLREV(3CLICCLLH(2MYIADPNIAKLVHFlJGY 
SEG 

PRD cceeeeeeeeeecccccccccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeecccc 
55 COILS 

mem 
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WO 01/98454 PCT/IB01/02050 

SE<2 PCELLPLTVAGIPSMHICLDFIPELIAdPELEKOIFAIflLLSHLCIOYALPKSLSVARLA 
SEG 

PRD ccceeeeeeeecccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhh 
COILS 

5 

MEM 

SE<2 VNVMGTLLTVLT(2AKRYAFFMPTLPSLVSFCRAFPPLYEDIMSLLI<2IG<2VCASDVAT<3T 
SEG • 

10 PRD hhhhhhhhhhhhhhhhhhhhccccccceeeccccccchhhhhhhhhhhhcchhhhhcccc 
COILS 



MEM 

15 SEU RDIDPIITRL(2(UIKEKPSGUS(3ICKDSSYKNGSRDTGSMDPDV(2LCHCIERTVIEIINMS 
SEG 

PRD cccchhhhhhhhhhhccccceeeeeccccccccccccccccceeeeeeeehhhhhhheee 
COILS 



20 MEM 

SE<3 VSGI 
SEG o o • o 
PRD eccc 
25 COILS 

MEM 



30 Prosite for DKFZphtes3_E0hl2 . E 

PSDDDE^ lb7->lflT LEUCINE_ZIPPER PDOCOOOS^ 

PSDDOET b^2->714 LEUCINE_ZIPPER PDOCOQOE^ 

35 

(No Pfara data available for DKFZphtes3_20hl2 • E) 
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WO 01/98454 
DKFZphtes3_21km 



PCT/IB01/02050 



5 group: testis derived 



DKFZphtes3_21klM encodes a novel SSfl amino acid protein without 
similarity to known proteins. 

10 No informative BLAST results; No predictive prositei pfam or SCOP 
motife- 



15 



20 



25 



The new protein can find application in studying the expression 
profile of testis-specif ic genes. 

unknown protein 
perhaps complete cds . 
Sequenced by LNU 
Locus: unknown 
Insert length: 2547 bp 

Poly A stretch at pos. 250fc>-« polyadenylation signal at pos. 247*1 



1 GGCCACGTTC 
30 51 ATGGGCTTAT 

1D1 AAACCATCAG 
151 GAGTGA AAGC 
201 CCAAACTGGA 
251 TATGACAGT A 
35 301 CAAATTGCTT 

351 TAAAAGCAGT 
401 AAAATACAGA 
451 AGCATTTGTG 
501 AAGAAGAAAG 
40 551 ACCAAGCAGA 

bOl AGTTGGTGAA 
b51 GTATAA AGGA 
701 AACAGAATAC 
751 GGAAAACCCA 
45 601 ATGAAATAGA 

651 ACCCCTGAGA 
^01 ACCTAGTGAA 
^51 GAACGTCGAG 
1001 TCCAGAGACC 
50 1051 GGATTCTCAT 
1101 GGCATGAACA 
1151 GACAGAGTAT 
1201 ACAAGAAAGA 
1251 GAGAGAAGGA 
55 1301 AAGGAA AGAT 
1351 GGTAGGTGTT 
1M01 CAAATTCTAG 
1451 ATGAGA AACA 



AGCGGACACG 
TTTGCCAAAG 
TGTTTGGGAA 
CTTCAGAGGG 
AATCCAGAAG 
TTTATGATGA 
TTGGGGAAAG 
TGAGATCAGA 
GAGAACGAGA 
ACATCTGCAT 
AGAAAAGAGG 
AAGATCTCAG 
GAGGAAGTAC 
AGAAAAATCA 
CACAAGAGAA 
GATGCAGACA 
AGAAACTAGA 
ATGACTTCAA 
GAAAGAGGGC 
AGGACATGAG 
AAGAGAACCA 
AGGCACAGAG 
GGAAGATAAA 
GGAAAAGGGA 
GATAGACAAC 
AGAGAAAAGC 
ATGAAAATAA 
CAGTCTTCAG 
GGCAAAGGAT 
TGGC A A AGGA 



GGAGCAAGAT 
AAAACACAGC 
TGATTCTGAT 
AAGCTGCTAA 
GCCCTTGCAG 
AATGCAGAAA 
ACAGAAAGCC 
AAAAAGGAAC 
AATGGAA AAG 
ATAAGAAAAA 
GCTGCTGCAC 
TGGATTTTAT 
CTAAATGCAG 
AGGGGCTTCT 
ATGCATTCTT 
GTGACTTCGA 
GTGAACTGCA 
GCACCACAGG 
ACAGTACCAG 
A AAAGGGAAG 
TTACACTGAC 
AGGCCAGTCA 
CCAAGGGCGA 
GAAAGATAGG 
AAAATGATCA 
A AAGCAA AGG 
TGATAA ATAC 
AAAGAAATCA 
AAATTTCTTG 
CAAAGAAAGA 



GGCGATTCCG 
AGTTGCACCC 
GATGATGATG 
GAAGC AGGCC 
AAGATGCTAC 
AAAAAGGAGG 
CAAGTATATT 
AGGAAAAAAG 
GGGGAGTTTG 
ACTGCA AGAG 
TGGAAGCATG 
AGGCACCTAT 
CTTTCGTGAA 
CCAATGAAGT 
CAAACTGATG 
TGCTAAGAGC 
GAAGGG A AAA 
AGTCAAAACC 
GCACCACACG 
ATCAGCACCA 
CGTGATTACC 
TAGAGATTCC 
GGGACCAAAG 
GAGAAATATT 
GAACCGACCC 
AAGAGCAT AT 
AGAGATAGAG 
AGACAGAAAG 
ACCAAGAAAG 
AACCAAGAGA 



GGCAGGCAGT 
TGTTTTGCAA 
AGACCTCTGT 
ATGAA ACAGA 
TGTGTATGA A 
AAAAT AATCC 
CACAACTTGC 
AATGGA AAAG 
ATGATAAAGA 
AGAGCTGAAG 
TTTGGATGTA 
TAAATCAAGC 
GCCAGATCTG 
AAGTTCAAAA 
TGAAAGTAGA 
AGTGCGGATG 
GGTCATAGAG 
ACTCTCGGTC 
AAAGGATCAC 
GCAGAAGCAA 
GGAAAGAAAG 
CATTGGAAGA 
AGAAAGAAGT 
CCCAAAGAGA 
AGTGAGAAAG 
GAAAGTAAGG 
AAAAACGAGA 
GAAAGCAGCC 
ATCCAACAAA 
AACCCTCTAA 
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WO 01/98454 



10 



15 



20 



1501 
1S51 
IbOl 
IbSl 
1701 
17S1 
1601 
1651 

noi 
nsi 

BDOl 
E051 
S1D1 
H1S1 
EE01 
2251 
E301 
5351 
S»401 
E451 
E501 



TTCTGAATCA 
AGAAGGGTAA 
AAGCGGAACA 
CAGGCAGATG 
ATTGATGGCT 
TTCCTGGAAC 
TGGAGGGCAT 
TTACAGCTTG 
GTGTACTCAT 
TGTGCTAATT 
CCATTAAAAA 
AACATTTTGG 
CCAGCCTAGG 
CCTAGCATGG 
CAAGAAGATC 
ATGCCACTGC 
AATTTTTTTT 
GAAATGTATT 
ATGTTCTGAA 
TAGCTTACAG 
ATTTGGAAAA 



TCACTGGG AG 
AGAACAAGAG 
ATGAAGAA AC 
GCGCGGGTTA 
ACCCCAAGAG 
CTGCTGCGTA 
TTTTAAATTT 
GATGTTTGGA 
CAATACCACA 
CCCTGTAGGT 
ATAATTTTGG 
GAGGCCAAGG 
CAGGATAATA 
TAGTCCATGC 
ACTTGAGCCT 
ACTCCAACCT 
AAATAAATAA 
TCAGATAAAA 
ATTTGTATTT 
CATAGTTGGC 
AAAAAAAAAA 



CAAAACACAG 
AGACCACCTG 
TGTAATGTCA 
ATGCAAAGAC 
AAAGATTTAA 
AAACCAT AAA 
ATTTTCAAA A 
TGTGGATGTT 
TTCTTTGTTG 
ACATAATGAG 
CCAGATACGG 
CAGAAGGATA 
AGACCTTGTC 
CTGTAGTCCC 
AGGAATTTGA 
GGGCAACAGA 
TTTAACTCTT 
TATGGATTTG 
AAGTATAAAA 
TTAAATGAAA 
AAAAAAAAAA 



ACTCACAGAG 
AGGCAGTGAG 
GCT AGAGACA 
CTATATTGAG 
GGAAGCACAG 
GGAGTGTGTT 
TTTTAAGTTA 
TGGCTGAATT 
TATTCAAGAA 
GAAAATTTGC 
TAGCTCGTGC 
TTGAGGCTAG 
TCTATTTAAA 
AGCTGTTCGA 
TGTTACAGTG 
ATGAGACCCT 
CTAATAATGT 
AAAAACAGAA 
T6TGAATCAT 
ATAAAATGAT 
AAAAAAAAAA 



PCT/IBO 1/02050 

GAAGGGCAAG 
CAAGTTTGCA 
GGTACTTGGC 
AAAGAAGATG 
AAAACTGTAA 
ACCAGTAGTT 
AAA6TCAGTC 
TATATATAGT 
CCGTTAAGAG 
TCCACTACAA 
CTGTAATACC 
GCATTCAAGA 
AAACAAAAAG 
GAGGCTGAGG 
AGGTATGATC 
GTCTCTAAAA 
TTTGTTGCAG 
AATATACTTT 
CTTGTCTAAA 
ATGCTTATAC 
AAAAAAG 



25 



BLAST Results 



No BLAST result 



30 Medline entries 

No Medline entry 

35 

Peptide information for frame 2 



40 ORF from ET bp to 17DE bp=. peptide length: 556 
Category: similarity to unknown protein 
Classification: Nucleic acid management 



1 MAIPGROYGL ILPKKT<2<JLH PVLCKPSVFG NDSDDDDETS VSESLflREAA 

45 51 KK<2AMK<3TKL ElfiKALAED A TVYEYDSIYD EMdKKKEENN PKLLLGKDRK 

101 PKYIHNLLKA VEIRKKEfiEK RMEKKKJRER EMEKGEFDDK EAFVTSAYKK 

151 KLtJERAEEEE REKRAAALEA CLDVTKdKDL SGFYRHLLNfl AVGEEEVPKC 

EDI SFREARSGIK EEKSRGFSNE VSSKNRIPflE KCIL(2TDVKV EENPDADSDF 

E51 DAKSSADDEI EETRVNCRRE KVIETPENDF KHHRSfiNHSR SPSEERGHST 

50 301 RHHTKGSRTS RGHEKREPfiH (3(3K(3SRDc3EN HYTDRDYRKE RDSHRHREAS 

351 HRDSHWKRHE (2EDKPRARDC RERSDRVUKR EKPREKYSUR E<2ERDR<3flND 

401 (2NRPSEKGEK EEKSKAKEEH MKVRKERYEN NDKYRDREKR EVGVflSSERN 

M51 (2DRKESSPNS RAKDKFLDOE RSNKMRNMAK DKERNflEKPS NSESSLGAKH 

5D1 RLTEEGQEKG KEfiERPPEAV SKFAKRNNEE TVMSARDRYL ARcJMARVNAK 
55 551 TYIEKEDD 
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10 
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PCT/IB01/02050 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_Elkm -, frame 5 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_Elkm i frame 2 

Report for DKFZphtes3_Elkl4 . B 



15 



20 



25 



30 



35 



40 



45 



50 



55 



elengthj 
echo 

EpIJ 

EHOMOLD) 

thai iana 

sequence • 

EFUNCAT3 

YKRD^Sc3 

EFUNCATH 

le-0S 

EFUNCATJ 

myristyla 

ES. 
EBL0CKSJ 
EBL0CKS3 
EEO 
EEO 
EPIRKLU 
EPIRKLO 
EPIRKLO 
EPIRKliO 
EPIRKliO 
EPIRKtO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
EPIRKliO 
ESUPFAH3 
ESUPFA(13 
□ b 

ESUPFAPO 

ESUPFAMJ 

le-0b 

ESUPFAPO 

ESUPFAIHI 

ESUPFAM1 

ESUPFArO 

ESUPFAni 

EKU3 

EKbO 



5b 7 

fi • ^b 

TREDBL: AC00b233_m gene: "F12K2 - 1M" V Arabidopsis 
chromosome II BAC F12K2 genomic sequence! complete 
3e-ll 

DM • other transcription activities ES. cerevisiae-i 
le-05 

30-10 nuclear organization ES. cerevisiaei YKRCH2cJ 

0b-D7 protein modification (glycolsylationi acylation-i 
tioni palmity lationn f arnesylat ion and processing) 
cerevisiae-. YKL201cID le-OM 

PF007M6F 

BL011A2E Glycosyl hydrolases family 35 proteins 
2-7-1-37 Protein kinase 7e-0b 
5-^-1-2 DIM A topoisomerase Me-Ob 

phosphotransferase 7e-0b 

pre-mRNA splicing le-Qb 

citrulline 3e-Db 

tandem repeat 3e-0b 

DNA binding Me-Db 

DNA replication Me-Ob 

isomerase Me-Db 

ATP 3e-0b 

phosphoprotein le-Ob 
calcium binding 3e-Db 
alternative splicing 7e-0b 
P-loop 3e-0b 
EF hand 3e-0b 
hair 3e-Db 
DEAD/H box helicase homology 3e-0b 

unassigned Ser/Thr or Tyr-specific protein kinases Me- 
calmodulin repeat homology 3e-Db 

unassigned r ibonucleoprotein repeat-containing proteins 

unassigned DEAD/H box helicases 3e-0b 

trichohyalin 3e-Db 

protein kinase homology Me-Ob 

eukaryotic type I DNA topoisomerase Me-Db 

r ibonucleoprotein repeat homology le-Db 

All_Alpha 

L0U_C0riPLEXITY 22-75 V. 
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SE(2 ATFSGHGSKI1AIPGR(3YGLILPKKT(3(3LHPVL(3KPSVFGN1>SDPDDETSVSESL(3REAAK 

SEG xxxxxxxxxxxxx 

PRD ccccccccccccccccceeeeccccccccccccccccccccccccccccchhhhhhhhhh 

SEt2 K(3AMK(2TKLEI(3KALAEDATVYEY]>SIYI>En(3KKKEENNPKLLLGKDRKPKYIHNLLKAV 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcccccccccch^ 

SEtf EIRKKE(3EKRnEKKI(2REREI1EKGEFI>PKEAFVTSAYKKKL(2ERAEEEEREICRAAALEAC 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxx xxxxxxxxxxxxx . 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhh 

15 SEG LDVTK(3KDLSGFYRHLLN(3AVGEEEVPKCSFREARSGIKEEKSRGFSNEVSSKNRIP(2EK 

SEG 

PRD hhhhhhhccchhhhhhhhhhhhcccccccc 

SE<3 CIL(2TDVKVEENPDADSDFDAKSSADDEIEETRVNCRREKVIETPENDFKHHRSt3NHSRS 

20 SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SE<2 PSEERGHSTRHHTKGSRTSRGHEKREDfiHt3CK(3SRD(3ENHYTDRDYRKERDSHRHREASH 

SEG 

25 PRD cccccbhhhhhhhhhhhcccchhhhhhhhh 

SEG RDSHUKRHEfiEDKPRARD(3RERSDRVlilKREKDREKYS(3RE(3ERDRt3(3NDe3NRPSEKGEKE 

SEG xxxxxxxxxxxxxxx • -xxxxxx 

PRD hhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccc 

30 

SE(3 EKSKAKEEHI1KVRKERYENNDKYRDREKREVGV(3SSERNi3DRKESSPNSRAKDICFLD(3ER 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhchhh^ 

35 SEG SNKMRNI1AKDKERN(3EKPSNSESSLGAKHRLTEEG(3EKGKE(3ERPPEAVSKFAKRNNEET 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhccchhhhhhhhhhhhhhhccccchh 

SE<2 VMSARDRYLARflMARVNAKTYIEKEDD 

40 SEG 

PRD hhhhhhhhhhhhhhhhhchhhhhcccc 



45 



(No Prosite data available for DKFZphtes3_21kl4 -H) 
(No Pfam data available for DKFZphtes3_Elkm . 2) 
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WO 01/98454 
DKFZphtes3_2Eill 



PCT/1B01/02050 



5 group: testis derived 

DKFZphtes3_EEill encodes a novel SflD amino acid protein with 
similarity to RCCl-like G exchanging factor RLG«i UVRfl (UVB- 
resistance protein) of Arabidopsis thaliana and to the murine 
10 retinitis pigmentosa GTPase regulator- 
No informative BLAST results; No predictive prosite-* pfam or SCOP 
motife- 

15 The new protein can find application in studying the expression 
profile of test is-specif ic genes. 



Homo sapiens chromosome 7q2E sequence-i ORFMn extension 

20 

differences to genmodel of ORFM -» 
differential splicing 

Sequenced by LMU 

25 

Locus: /map^^TqES" 
Insert length: EE3b bp 

Poly A stretch at pos. E117n polyadenylat ion signal at pos- ElflD 

30 ' 

CAAGAGGAGG 
CCGCAAGCAG 
CAGATACAGG 
AATCTGGACC 
GGGCGAGAAG 
GCCCTTCTTG 
AATCCGATTT 
CTCATTCCTC 
ACTTCCACGA 
AGACTCAGTC 
GAGAGCTGCC 
GAGGCCGCCG 
GGCTACCGCC 
CGTGGGGACC 
AGATGCAGTG 
AAGGATTTTG 
CTACGTCTTG 
GCCGGGCCTG 
GTCTTCAAGA 
GGTTGGTCAG 
AGATCTACTC 
TACACGGTTC 
GCGCGTGGCC 
ACCAGGGGGG 
CTCTTTGGGA 
TGCTCTCTCA 
ACCTTGGCCT 

-349- 



1 ACAATGCTCA GATCGGGAGG TGGAGCCAAT CAGGTCC A AC 
51 GGACACCGGC ACTCCACTAG CAGGA AAACG GGCCGAGGGA 
101 GGGGTGCCTA GTCCTCGTCC CCCAAAGACC AATCGTAAGC 
35 151 CGAGTGACTG TC A AGAAGGC CAATTAGAGC CTCCGAAGGG 

501 TGCCTCTTCT CTGAGGGACG GCTCTACCTA CCA ATAGCAT 
E51 GCGGTCCCTT TGCTAAGGAG GAGGCGGGTG AAGAGAAGCT 
301 TGGCTCGGAG CTTGGGGTTG AAGAGAAGAG GGGGAAAGGA 
351 CCATCCAGTT GTTCCCCCCA GAGCTGGTGG AGCATATCAT 
40 L401 CCAGTCAGAG ACCTTGTTGC CCTCGGCCAG ACCTGCCGCT 

M51 AGTGTGCGAT GGGGAAGGCG TGTGGAGACG CATCTGTCGC 
501 CGCGCCTCCA AGATCAGGGT TCTGGAGTCC GGCCCTGGAA 
551 ATTCTGAACT ACACGAAGGG CCTGTATTTC CAGGCATTTG 
bOl CCGATGTCTC AGCAAGAGCG TGGCCCCCTT GCTAGCCCAC 
45 t51 GCTTCTTGCC CACCAAGGAT CACGTCTTCA TTCTTGACTA 

701 CTCTTCTTCC TCAAAAATGC CCTGGTCTCC ACCCTCGGCC 
751 GAAGCGGGCC TGTCGCTATG TTGTGTTGTG TCGTGGAGCC 
601 CCTCGGACCC AAGGTGTGAC ACAGTTTACC GTAAATACCT 
651 GCCACTCGGG AGCCGCAGGA AGTGGTGGGT ACCACCAGCA 
50 101 TGACTGTGTT GAGGTCTATC TGCAGTCTAG TGGGCAGCGG 

151 TGACATTCCA CCACTCAATG ACCTTCAAGC AGATCGTGCT 
1001 GAGACCCAGC GGGCTCTACT GCTCCTCACA GAGGAAGGAA 
1051 TTTGGTAGTG AATGAGACCC AGCTTGACCA GCCACGCTCC 
1101 AGCTGGCCCT GAGGAAGGTG TCCCACTACC TGCCTCACCT 
55 1151 TGCATGACTT CCAACCAGAG CAGCACCCTC TACGTCACAG 
1ED1 AGTGTATTTT GAGGTGCATA CCCCAGGGGT GTATCGCGAT 
1251 CCCTTCAAGC CTTTGACCCC CTGGACCAGC AGATGCCGCT 
1301 CTGCCTGCCA AGATCCTATT CTGTGCTCTT GGCTACAACC 



WO 01/98454 

1351 GGTGGATGAA TTTGGCCGAA 

1MD1 AGCTAGGAAC AGGGGACAAA 

1451 TACCTGCAGC GGCCCATCAC 

15D1 GCTGAGCCAG AGCTCAGAGT 

5 1551 GGGQ1GGGGG CCGCCTCCCA 

lfe»01 AAGCTCCAAG TCAAGGTCCC 

IbSl GGAGTGCCTA TACATCCTGT 

17D1 ATCGCCACCT GCCAGCCAGC 

1751 GGGGCCAGAG CACCCCAGGA 

10 IflOl GTACCTCAGC CAGATCCACA 

1S51 AGATGAAGGA GATCGTAGGG 

nDl TTCTTCTGGG AGGCCCTGGA 

n51 TGGTGTAGGG CCCCCAGCCC 

2001 GTCCCTGGAG GAGGGAGTCC 

15 2051 CCATTGTGCA CATGCGTGTG 

2101 A ACCAGGGTA AGAATGTTCA 

2151 ACTATCATGG ACAAGAGATT 

22D1 AAAAAAAAAA AAAAA AAAAA 

20 




PCT/IB01/02050 

TCTTCATGCA AGGA AATAAC AGATACGGGC 
ATGGACCGAG GGGAACCCAC ACAGGTTTGT 
CCTGTGGTGC GGCCTCAACC ACTCCCTGGT 
TCAGCAAGGA GCTGCTGGGC TGCGGCTGTG 
GGCTGGCCCA AGGGGAGTGC CTCCTTCGTC 
TCTGTGTGCC TGTGCCCTCT GTGCCACCAG 
CCAGCCACGA CATTGAGCAG CACGCCCCCT 
AGGGTGGTGG GGACTCCTGA GCCCAGCCTG 
CCCCGGGGGG ATGGCCCAGG CCTGCGAGGA 
GTTGCCAAAC GTTGCAGGAC CGCACGGAGA 
TGGATGCCCC TGATGGCCGC ACAGAAGGAC 
CATGCTGCAG AGGGCTGAAG GAGGQGGGGG 
CTGAGACCTA ATCCCCCTCA TGCTAGCCTA 
GGCCCCAGGC CAGGGACTAA GGAGCAATGA 
GGAAGGGGTT GCTAGGGGGT GGGGACGGCT 
GGGGGCTGCC CAGGAGGGGC CCCCAACCTG 
TGATGGATAG AATAAAAGGC TGCAGCGAAA 
AAAAAAAAAA AAAAAG 



BLAST Results 



Entry AF05335L from database EMBL: 
25 Homo sapiens chromosome 7q22 sequence-i complete sequence . 
Score = 2=5521 P = D.De + DCK identities = h^b/lB^ 
10 exons 



30 



35 



40 



Tledline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 23^ bp to n?fi bp=i peptide length: SflD 
Category: similarity to unknown protein 
Classification: no clue 

45 1 MGEKA VPLLR RRRVKRSCPS CGSELGVEEJC RGKGNPISItf LFPPELVEHI 

51 ISFLPVRDLV ALGdTCRYFH EVCDGEGVWR RICRRLSPRL (2D<2GSGVRPW 

101 KRAAILNYTK GLYFGAFGGR RRCLSKSVAP LLAHGYRRFL PTKDHVFILD 

151 YVGTLFFLKN ALVSTLGgNd UKRACRYVVL CRGAKDFASD PRCDTVYRKY 

201 LYVLATREPtf EVVGTTSSRA CDCVEVYLC2S SG6RVFKMTF HHSMTFKG2IV 

50 251 LVGflETtfRAL LLLTEEGKIY SLVVNETC3LD <3PRSYTV(3LA LRKVSHYLPH 

301 LRVACMTSNO SSTLYVTDtfG GVYFEVHTPG VYRDLFGTLtf AFDPLDfltfriP 

351 LALSLPAKIL FCALGYNHLG L VDEFGRIFN (3GNNRYG(2LG TGDKMPRGEP 

4D1 TtfVCYLtfRPI TLUCGLNHSL VLSI2SSEFSK ELLGCGCGAG GRLPGUPKGS 

451 ASFVKLdVKV PLCACALCAT RECLYILSSH DIEtfHAPYRH LPASRVVGTP 

55 501 EPSLGARAPfl DPGGMAtSACE EYLS(2IHSC(2 TLflDRTEKNK EIVGWflPLPIA 
551 AtfKDFFWEAL DMLtfRAEGGG GGVGPPAPET 
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WO 01/98454 PCT/IB01/02050 



BLASTP hits 

No BLASTP hits available 

5 

Alert BLASTP hits for DKFZphtes3_22ill -. frame 2 

TREMBL : AF0533Sb_ll product: "0RF4 n n Homo sapiens chromosome 7q22 
sequencer complete sequence- n N = In Score = 1554-1 P = l-be-lST 

10 

TREMBL ' AF13Q4m_l gene: "UVR6 n =; product: "UVB-resistance protein 
UVR6"n 

Arabidopsis thaliana UVB-resistance protein UVR6 (UVR6) mRNA «* 
complete 

15 cds.-. N = l-i Score = IDT, P = 0.0062 

TREMBL : AF0M4L77_1 gene: "Rpgr", product: "retinitis pigmentosa 
GTPase 

regulator"^ Mus musculus retinitis pigmentosa GTPase regulator 
20 (Rpgr) 

mRNAi complete cds-i N = li Score = 10t>-» P = 0-035 



>TREMBL: AF05335b_ll product: "0RFH"=. Homo sapiens chromosome 
25 7q22 sequence-i 

complete sequence* 
Length = 31fl 

HSPs: 

30 

Score = 1554 (533-5 bits)-. Expect = l-be-lST-. P = l-be-lST 
Identities = 303/316 CT5*/.)-, Positives = 303/316 (T5*) 

fiuery: 1 

35 MGEKAVPLLRRRRVKRSCP.SCGSELGVEEKRGKGNPISK3LFPPELVEHIISFLPVRDLV bO 

MGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISItfLFPPELVEHIISFLPVRDLV 
Sbjct: 1 

MGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISK3LFPPELVEHIISFLPVRDLV tO 

40 

fluery: tl 

ALGt2TCRYFHEVCDGEGVWRRICRRLSPRL(3Dr3GSGVRPliJKRAAILNYTKGLYF«3AFGGR 120 
ALGflTCRYFHEVCDGEGVWRRICRRLSPRLflDd 

TKGLYFC2AFGGR 

45 Sbjct: bl ALG(2TCRYFHEVCDGEGVli)RRICRRLSPRL(3D<3D 

TKGLYFtfAFGGR 10b 

(2uery: 121 

RRCLSKSVAPLLAHGYRRFLPTKDHVFIL3>YVGTLFFLKNALVSTLG(3M(3UKRACRYVVL 160 

50 

RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFLKNALVSTLGt3M(3UKRACRYVVL 
Sbjct: 107 

RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFLKNALVSTLGflMgUKRACRYVVL Ibb 
55 Guery: 161 

CRGAOFASDPRC])TVYRKYLYVLATREP<2EVVGTTSSRACDCVEVYL(2SSGt2RVFKMTF 240 
CRGAKDFASPPRCDTVYRKYLYVLATREP(3EVVGTTSSRACI>CVEVYL(2SSG(3RVFKMTF 
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WO 01/98454 PCI7IB01/02050 
Sbjct: It? 

CRGAKDF ASDPRCDTVYRKYLYVLATREP(3E VVGTTSSRACDCVEVYLt3SSG(2RVFKHTF 22b 
Query: 2M1 

5 HHSHTFK(3IVLVG(3ET(3RALLLLTEEGKIYSLVVNET(3L1>(3PRSYTV(3LALRKVSHYLPH 3D0 

HHSnTFKr3IVLVG(3ET(3RALLLLTEEGKIYSL VVNET(3LD(3PRSYTV(2L ALRKVSHYLPH 
Sbjct: 227 

HHSI1TFK(2IVLVG(3ET(3RALLLLTEEGKIYSLVVNET(3LD(3PRSYTV(3LALRKVSHYLPH Sfifc, 

10 

<2uery: 301 LRVACIITSNflSSTLYVTD 31fl 

LRVACflTSNtfSSTLYVTD 
Sbjct: 267 LRVACMTSNflSSTLYVTD 30*4 

15 

Pedant information for DKFZphtes3_22ill n frame 2 
Report for DKFZphtes3_22ill - 2 

20 

ELENGTHJ SfiD 
EpU T-Dl 

25 EH0M0L3 TREMBL : AFDS33Sfc_.il product: "ORFM"^ Homo sapiens 

chromosome 7q22 sequence! complete sequence- le-17M 
EBL0CKS3 BLD0b2SB Regulator of chromosome condensation (RCC1) 
proteins 

EBLOCKSJ BL00b25A Regulator of chromosome condensation (RCC1) 
30 proteins 

CKU3 Alpha_Beta 

EKIO L0U_C0MPLEXITY 3-b2 V. 

35 SE<2 MGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISK3LFPPELVEHIISFLPVRDLV 

SEG 

PRD ccccchhhhhhhhhcccccccccccccccccccccceeeeccccchhhhhhheeeeeeee 

SEt2 ALG(3TCRYFHEVCDGEGVWRRICRRLSPRLt3])(3GSGVRPlJKRAAILNYTKGLYFi2AFGGR 
40 SEG 

PRD ecccceeeeeeeecceeeeeeeeeecccccccccccccccccchhhhhhccceeeecccc 

SEtf RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFL)CNALVSTLG(3M[3UKRACRYVVL 
SEG 

45 PRD eeeeccchhhhhhhheeeeccccceeeeeeeeeeecccceeeeeeccchhhhhhhhheee 

SEC CRGAKDFASDPRCDTVYRKYLYVLATREP(3EVVGTTSSRACDCVEVYL(2SSG(2RVFK:I1TF 
SEG 

PRD ecccccccccccceeee eehhhhhhhhccceeeeeccccceeeeeeeeecccceeeeeec 

50 

SEG HHSHTFK(3IVLVG(3ET(3RALLLLTEEGKIYSL VVNET(3LD(3PRSYTVGLALRKVSHYLPH 

SEG 

PRD ccccceeeeeeeehhhhhhhhhhhhhcceeeeeeeccccccccceeeehhhhhhhhccce 

55 SE<2 LRVACnTSN«3SSTLYVTD(2GGVYFEVHTPGVYRDLFGTL(3AFDPLD(2t3nPLALSLPAKIL 

SEG 

PRD eeeeeeccccccceeeecccceeeeeccccccccccceeeecccccccceeeeeccceee 
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WO 01/98454 PCT/IB01/02050 

SE<3 FCALGYNHLGLVDEFGRIFH(3GNNRYG(2LGTGDKnDRGEPT(3\/CYLc3RPITLliICGLNHSL 
SEG 

PRD eeeeccccceeeeeceeGeeeccccccccccccccccccccGeeeeccceeeecccccee 

5 SEtf VLS(3SSEFSKELLGCGCGAGGRLPGUPKGSASFVKL(3VKVPLCACALCATRECLYILSSH 

SEG xxxxxxxxxxxxxx • 

PRD eeeeccccceeeeccccccccccccccccceeeeeeeeeeeeeeeeeeeecccceeeecc 

SE<2 DIE(3HAPYRHLPASRVVGTPEPSLGARAP<3DPGGnAi3ACEEYLS(3IHSC{3TL(3DRTEK(1K 
10 SEG 

PRD cccccccccccccceeeGcccccccccccccccchhhhhhhhhhhhhcchhhhhhhhhhh 

SE<2 EI VG WNPLH A AflKDFF LIE A LDNLflR AEG GGGG VGPP A PET 

SEG xxxxxxx 

15 PRD hhhhcchhhhhhhhhhhhhhhhhhhhccccceeecccccc 

(No Prosite data available for DKFZphtes3_22ill - 2) 
20 (No Pfam data available for DKFZphtes3_22ill -2) 
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WO 01/98454 
DKFZphtes3_E21E4 



PCT/IB01/02050 



5 group: testis derived 

DKFZphtes3_251E4 encodes a novel 451 amino acid protein with 
similarity to the F-box protein FBL2 of the rat- 
io No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 



15 



20 



25 



The new protein can find application in studying the expression 
profile of testis-specif ic genes- 
similarity to p37NB (Homo sapiens) 
Sequenced by LMU 
Locus: /map= n 7qE5-q31-l" 
Insert length: 1537 bp 

Poly A stretch at pos- 145Ti no polyadenylation signal found 



1 CAACAGGACG ATGCGACTCC TGCCGAGGCA CTTCCACAAC TTACAGAATC 

51 TTAGTTTGGC TTATTGCAGA CGGTTCACAG ACAAAGGCTT ACA6TACCTG 

101 AACTTGGGGA ATGGATGCCA CAAGCTCATC TATCTGGACC TCTCTGGCTG 

30 151 CACCCAGATT TCA6TCCAAG GCTTCAG6TA CATTGCAAAC AGCTGCACTG 

201 GAATTATGCA TCTTACCATT AATGACATGC CAACTCTGAC GGACAACTGT 

251 GTAAAAGCTT TAGTTGAAAA ATGCTCTCGT ATTACATCGC TGGTTTTCAC 

301 TGGTGCACCG CATATCTCCG ATTGTACTTT CAGAGCTCTT TCTGCTTGTA 

351 AACTCAGAAA GATCCGATTT GAAGGAAATA AAAGGGTTAC TGATGCATCC 

35 401 TTCAAATTTA TAGACAAGAA TTATCCAAAT CTCAGTCACA TTTATATGGC 

MSI TGACTGCAAG GGAATAACAG ACAGCAGCCT CAGATCCCTT TCACCTTTGA 

501 AGCAACTGAC TGTGTTGAAT TTGGCAAATT GTGTAAGAAT TGGTGATATG 

551 GGACTAAAGC AATTTCTTGA TGGTCCTGCA AGCATGAGGA TAAGAGAGCT 

bOl AAATTTAAGC AACTGTGTGC GGCTAAGTGA TGCCTTTGTT ATGAAACTAT 

40 bSl CTGAGCGCTG CCCTAATTTA AACTACTTGA GTTTACGAAA TTGTGAACAT 

701 TTGACTGCCC AAGGAATTGG ATATATTGTA AACATCTTTT CCTTG6TATC 

751 AATAGATCTC TCTGGAACAG ACATCTCTAA TGAGGGTTTG AATGTGCTTT 

flOl CCAGACATAA AAAATTGAAG GAACTTTCTG TATCTGAATG TTATAGAATC 

651 ACTGATGATG GAATTCAGGC ATTCTGCAAA AGCTCACTGA TCTTGGAACA 

45 101 TTT6GAT6TC TCTTATTGCT CCCAGCTGTC AGATATGATT ATCAAAGCAC 

151 TGGCCATTTA CTGCATTAAC CTCACATCTC TCAGCATTGC TGGCTGTCCA 

1001 AAGATTACTG ACTCA6CAAT G6AGATGTTA TCG6CAAAAT GCCATTACCT 

1051 GCACATTTTG GATATCTCTG GTTGTGTCTT GCTTACTGAC CAAATCCTTG 

1101 AGGACCTTCA GATAGGCTGC AAACAACTCC GGATCCTTAA GATGCAATAC 

50 1151 TGCACAAATA TTTCCAAGAA GGCAGCTCAA AGAATGTCAT CTAAAGTTCA 

1E01 GCA6CAGGAA TACAACACTA ATGACCCTCC ACGTTGGTTT GGCTATGATA 

1E51 6GGAA6GAAA CCCTGTTACA GAGCTTGACA ACATAACATC ATCTAAAGGA 

1301 GCCTTAGAAT TAACAGTGAA AAAGTCAACA TACAGCAGTG AAGACCAAGC 

1351 AGCGTGACCT TCAGCCTCAA GCAGGAAGAA CAAAAAATCA AGAACTTGGC 

55 1M01 AAGTTTTCTC CATTTGTTGC AAGTATGTTT ACTAGCTGAA TCTCAATA AC 

1451 AATGTAAACA AGCAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1501 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAG 
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WO 01/98454 



PCT7IB01/02050 



BLAST Results 



5 Entry AC005250 from database EHBL : 

Homo sapiens BAC clone RG31-AM05 from 7qEE-q31.1-i complete 
sequence- 
Score = A30-. P = I.fie-IEL*-, identities = 160/n3 

10 Entry HS3E^D7 from database EMBL : 
Human p3?NB mRNAn complete cds- 
Score = 31A-. P = M-be-Cm, identities = 7D/76 



15 

Medline entries 



^713bfl75: 

20 Kim D-i Latfuaglia MP-. Yang SY.; A cDNA encoding a putative 37 kDa 
leucine-rich repeat 

(LRR) proteini p37NBi isolated from S-type neuroblastoma 

cell has a differential tissue distribution- Biochim Biophys Acta 

- 

25 Pec ll;l301O>:lfi3-A 



30 Peptide information for frame 2 



50 



0RF from 11 bp to 135M bpi peptide length: MMfl 
Category: similarity to known protein 
35 Classification: unclassified 

1 MRLLPRHFHN LtfNLSLAYCR RFTJ>KGL<2YL NLGNGCHKLI YLDLSGCTtfl 

SI SViSGFRYIAN SCTGIMHLTI NDMPTLTDNC VKALVEKCSR ITSLVFTGAP 

1D1 HISDCTFRAL SACKLRKIRF EGNKRVTDAS FKFIDKNYPN LSHIYPIADCK 

40 151 GITDSSLRSL SPLKGLTVLN LANCVRIGDN GLKflFLDGPA SMRIRELNLS 

EDI NCVRLSDAFV NKLSERCPNL NYLSLRNCEH LTAfiGIGYIV NIFSLVSIDL 

ESI SGTDISNEGL NVLSRHKKLK ELSVSECYRI TDDGItfAFCK SSLILEHLDV 

301 SYCS<3LSDI1I IKALAIYCIN LTSLSIAGCP KITPSAIIEriL SAKCHYLHIL 

351 DISGCVLLTD tflLEDLdlGC K0LRILKIK2Y CTNISKKAAfl RI1SSKV(2t3(2E 

45 MD1 YNTNDPPRUF GYDREGNPVT ELDNITSSKG ALELTVKKST YSSEDC3 AA 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for »KFZphtes3_E515M -. frame E 
55 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_E212M i frame E 
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PCT/1B01/02050 



Report for DKFZphtes3_BElBM . E 



5 CLENGTH3 451 

CMliD SDSMS-^S 

CpO fi-bfl 

EHOnOLl TREMBLNEU : AFlflbE73_l product: "leucine-r ich 

repeats containing F-box protein FBL3"i Homo sapiens leucine-rich 

10 repeats containing F-box protein FBL3 mRNAi complete cds. fle-31 

CFUNCAT3 11-01 stress response CS- cerevisiae-. Y JRCHDcll fle-ED 

CFUNCATJ D3-D1 cell growth CS. cerevisiae-, YJRO^QcJ fle-ED 

CFUNCATJ Dfl-n cellular import CS. cerevisiaei Y JRDTOcID fle-ED 

CFUNCATJ 03 - BE cell cycle control and mitosis CS - cerevisiae-i 

15 YJRCHOcJ fle-EO 

CFUNCATJ D3-D4 budding-, cell polarity and filament formation 

CS- cerevisiae-. YJROTOcJ fle-ED 

CFUNCATJ 01-05. OH regulation of carbohydrate utilization CS. 
cerevisiae-. YJROTOcJ fle-EO 

20 CFUNCATJ ll-O 1 * dna repair (direct repair-, base excision repair 

and nucleotide excision repair) CS. cerevisiae-. YJROSEwJ 3e-0? 

CFUNCATJ 30.-10 nuclear organization CS- cerevisiae-. YJRD5EwJ 
3e-D7 

CBL0CKSJ PROOOnB 

25 CBL0CKS3 PR003LHD 

CBLOCKSJ BP01TE1A 

CBLOCKSJ BP037M3B 

CPIRKU3 tandem repeat Se-lfl 

CPIRKU J zinc finger le-07 

30 CPIRKb) J DNA binding le-07 

CSUPFAHJ leucine-rich alpha-B-glycoprotein repeat homology Ee-lfl 



35 



CSUPFANJ regulatory protein ESAGflc le-07 
EKU3 Alpha_Beta 

SE<2 NRTMRLLPRHFHNL(3NLSLAYCRRFTPKGLf2YLNLGNGCHICLIYLDLS6CT(2ISVt2GFRY 
PRD ccccccccccccccccceeeeeecccccccceeeecccccceeecccccccccccccccc 

40 SE<2 IANSCTGIMHLTINDriPTLTDNCVICALVEKCSRITSLVFTGAPHISDCTFRALSACKLRK 
PRD ccccccceeeeeccccccccchhhhhhhhhhhccccccccccccccccccccccccceee 



45 



SE<2 IRFEGNKRVTDASFKFIDKNYPNLSHIYNADCKGITDSSLRSLSPLKG3LTVLNLANCVRI 

PRD eeecccccccccccccccccccceeeeeeeccccccchhhhhhhhcccccccceeeeeec 

SE<2 GDf1CLK(3FLDGPASriRIRELNLSNCVRLSDAFVI1KLSERCPNLNYLSLRNCEHLTA(3GIG 

PRD cccccccccccccccccceeeeccccccccccchhhhhhcccccccccccccccccccee 



SE<3 YIVNIFSLVSIDLSGTDISNEGLNVLSRHKKLKELSVSECYRITDDGI(2AFCKSSLILEH 

50 PRD eeccccceeeeeecccccccccchhhhhhcccccccccccccccchhhhhcccccccccc 

SE<2 LDVSYCS(3LSDf1IIKALAIYCINLTSLSIAGCPKITDSAnEnLSAKCHYLHILDISGCVL 

PRD cceeecccccchhhhhhhhccccceeeeeecccccchhhhhhhhhhccceeeeecccccc 

55 SEt3 LTDt2ILEDL(2IGCfC(3LRILKMc3YCTNISKfCAA(3RnSSKV(3i2t3EYNTNDPPRUFGYDREGN 

PRD chhhhhhhhhhcchhhhhhcceeeeechhhhhhhhhhhhheeeccccccccccccccccc 

SE(3 PVTELDNITSSKGALELTVKKST YSSEDC3AA 
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WO 01/98454 PCT71B01/02050 
PRD ccccccccccccceeeeGccccccccccccc 

(No Prosite data available for DKFZphtes3_2212 l 4 - S ) 
(No Pfam data available for DKFZphtes3_2212M - 2 ) 
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WO 01/98454 
DKFZphtes3_2bg3 



PCT/IB01/02050 



5 groups testis derived 

DKFZphtes3_2bg3 encodes a novel 10^0 amino acid protein without 
similarity to known proteins • 

10 No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
profile of test is-specif ic genes- 

15 

similarity to Celegans CCHD4-4 

on genomic level encoded by HSDJIIAI^ 
20 perhaps complete cds- 

Sequenced by EMBL 

Locus: /map="fc» n , 

25 

Insert length: 45b2 bp 

Poly A stretch at pos. 4550i polyadeny lation signal at pos- 4515 



30 1 GATTCAGTTA CTGAAGACTT 

SI TCAGAGATCA GAGTCCAGTA 
101 CTGTAGCAGG ACTTTCTAGC 
151 TCCAGTATTT GGTATACAGA 
SOI AGGAAAG A AT GAAGAATCAA 
35 251 TGAAAACAAT GAAATCTGAA 

301 AAGGATTCTG TGGTTTTGGT 
351 AAATGATCTC . ATTAAATGCT 
401 AAGGTCTGGA TCCCACAATA 
451 ATGAGACAGA CAAGTCAAAA 
40 501 GAGAACTGAA CAAAAGTCTC 

551 TTGATCCTTT GAACTCTGGC 
bOl TCAGGTAAAC TTGATATCTC 
b51 ACACAATCTG GCATCCAGAA 
701 CAACCCCATC TTTGGGAGTT 
45 751 GATCCTTTCA GTGGAGAGAA 

flOl GCTTCGACAA GAGGAAATAC 
a51 CCTTAGAATC TAATGGTAAA 
^□1 GAAGCTTTGC AAGAAGCAAA 
^51 ATTACGAAGT AATCTACCTG 
50 1001 TAAGTGGAGA TACAATTAAG 
1D51 TCTAGATTTT CAGATTCAGG 
11D1 ACATCCAAAC ACTGATTTAG 
1151 GCAATAGTGA AAGATTATTT 
12D1 GTAAAATTTT CATTAGGAAA 
55 1251 TGAAATACAG TCATCTTTGA 
13D1 AACTGTCACC TGATGAAAAT 
1351 CTAAATGATA GCAAAACTGT 
1401 ATGTGATGAT ACTA A AAAGT 



AGATGCACCC TGGATGGGAA TTCAGAATCT 
AAATGGATAA ATATGAGACT GAAGAAAGCT 
CCAGAGTTGA AAGTC AGACC TGCTGGTGCC 
AGGTGAAAAG CAGCTAACAA AATCTCTAAA 
ATAAATCCAA AGTTAAGGTT ACTAAGCTTA 
AACACAAAAA AATTAATAAA ACAGAACTCT 
AGGCTACAAA TGTTTGAAAA GTACAGCATC 
TTGAAGGCAA TCCTTCACAT AGTCAGAAGG 
TGTGGATATA ATTTTGACCC AAAGACCTAC 
GGAAGCTAGC TGTTTGCCAA CTAATACAGA 
CAGATATTGA AAATGTTCAA CCAGACCAGT 
AACCTAAATC TTTGTGCAAA TTTGTCCATT 
CCAGGACGAT AGTGAAATTA CACAAATGGA 
GGTCATCAGA CGATTGCCAT GATCATCAAA 
AGAACAATTG AAATAAAGCC CAGTAATAAA 
TATAACTGTC AAACTAGGAC CTTGGACAGA 
TTGTGGATAA TTTACTACCC AACTTTGAGT 
TCTAAATCTA TAGAAATAAC ATTTGAAAAG 
GTGTCTTTCT ATTGGAGAAT CATTAACTAA 
CCCCTTCTAC AAAAGAATAT CATGTTGTAG 
TTACCAGATA TTAGTGCCAC ATATGCCTCA 
TGTTGAAAGT GAACCGAGTT CTTTTGCGAC 
TCTTTGAAAC T6TGCAAGGG CAAGGTCCTT' 
CCTCAGCTTT TGATGAAACC TGATTATAAT 
TCATTGTACT GAGAGTACAA GTGCTATAAG 
CATCCATAAA CTCTCTACCC TCCGATGATG 
TCTAAGAAAT CTGTTGTACC TGAATGCCAT 
ATTAAATCTA GGAACGACTG ATTTGCCAAA 
CAAGTATCAC TTTGCAACAG CAGAGTGTTG 
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1451 TATTTTCAGG GAACTTGGAC 

1501 TCAAGCATTA AAGACCCTTT 

1S51 CAGTGATGTG AAAAGTAGTT 

lbQl GTAAAGGCTT CCAGAGTCCT 

5 IfeSl ATTACATTAA ATTCAAAACT 

17D1 AGGTTCCATT TCTAGT AATA 

1751 AAAATAGTGA TGTATT AA AT 

1B01 GTTGAAAGTG AAACTCATCT 

1651 TGATATAGTA AAGCAAGGGC 

10 nOl GTACGGATAT TTCTGACACA 

l^Sl CCTCAGAAGG AAACTTCTGA 

2001 GGATAAAGAG GATGAGGAGG 

2051 GGTACTATGA AGAAACAGAT 

2101 CACTATACAA GCAGAGATGA 

15 2151 A AAAATAAAC AGTGACTATC 

5E01 GTACTTCTGG TTGTTTGTCC 

2251 AATGTTAAAT ATTCTTCCAA 

23D1 AAGCAGTACT TCTTACAACT 

2351 CACCAAAACC TCAAATACA A 

20 2401 CTACTAAAAC TTCCTGGGTT 

2451 CTCAGTACCT TATTTTAGTG 

2501 TACATCTGAT TGTCTGTGTG 

2551 CGATTAGTAA AAACTTACAT 

2b01 TTTTCTTATG TCTGAGAGA A 

25 2b51 GCATGACTGA TCGTCTTTTG 

2701 AGTCTAACAG TCTCA AAA AT 

2751 AATAATTCGT TCAGTGCTTA 

2801 AACTTCATAC CTTTCTGTCT 

2A51 AACAGCAGTG CTCTTGTTAA 

30 2^01 AAAATCAGGT TCGCTTTTGC 

2^51 GCCAAACTTT TTTATATAAG 

3001 AAA AATGTTG TGCTAGTGGG 

3051 CTCTGCCCGC ATTGAAATGT 

3101 GACAGATCTA TTCAGAAATG 

35* 3151 AGCAAGGACT GTAATTTGGT 

3201 TACAGCTGAT TCACTCATTG 

3251 CGGAAATATT TTTAGAGAAA 

3301 CAATAGTATA AAAGCATTGT 

3351 TGTTTCAAAT AATGTATTAT 

40 3401 AAATATTTAT ACCTTTTTAT 

3451 GTGCTTTTTA AACATCAACT 

3501 TTTTTTTAAT TTTATCTTTT 

3551 CTACCTAAGT ATTTCAGTGA 

3t01 TTATTATTGG CTTTCCACAA 

45 3fc51 CCATTATTGC TAGAATAGCA 

3701 TTATTTAGTT AATTATA AAT 

3751 TGGAAATAAA ATT ATGGCTG 

3fi01 TAGTTCTAAA ATACAACTTT 

3fi51 AAACAGTGTA ATACAAGTTT 

50 3^01 ATTTCATGCC TATTAAAATA 

3T51 CTCATACTGA CTTTTATTAC 

4001 ATAAAGATTT TTGAATGTTT 

4051 TTGCTAATTG GTATGTTGCT 

MIDI TTTTTCATGG ACTTCCTTAT 

55 4151 TGAAATACTT TTATGAATTT 

4201 TGAACTAAAA AGTAATGTAA 

4251 AATAATTAAC TTTACATGTT 

4301 ATGGAGATGT TGAGTCTTTT 



PCT/1B0 1/02050 

AATGAAACTG TAGCAATACA TTCCTTAAAT 
ACA ATTTGTT TTTTCAGATG AAGAGACTTC 
GCAGCTCCAA ACCTAACTTG GATACTATGT 
GATAAATCTA ATAACTCTAC AGGGACAGCA 
GATTTGTTTA GGCACTCCTT GTGTCATTTC 
CAGATGTTAG TGAAGAT AGA ACTATGAAAA 
CTCACACAGA TGTATTCAGA AATCCCTACA 
GGGTACAAGT GATCCTTTTT CAGCCAGTAC 
TTGTGGAAAA TTATTTTGGT TCTCAAAGCA 
TGTGCTGTTA GCTACAGCAA TGCACTTAGC 
AAAAGAAATT AGTAATCTTC AGCAGGAACA 
AAGAGCAGGA TCAACAAATG GTTCAAAATG 
TATTCAGCTT TGGATGGAAC AATAAATGCT 
ACTAATGGAA GAAAGACTTA CAAAATCTGA 
TGAGAGATGG TATAAACATG CCTACTGTCT 
TTCCCGTCTG CACCACGAGA GTCTCCTTGT 
AAGTAAATTT GATGCCATTA CAAAGCAGCC 
TCACTTCTTC GATTTCCTGG TATGAAAGTT 
GCCTTCCTTC AGGCAAAAGA AGAACTGAAG 
CATGTACAGT GAAGTTCCTC TGCTGGCATC 
TAGAAGAAGA GGGTGGTTCT GAAGATGGAG 
CACGGTTTAG ATGGAAACAG TGCAGATCTC 
TGAACTTGGA TTGCCTGGGG GAAGAATTGA 
ATCAGAATGA TACTTTTGCT GATTTTGATA 
GATGAGATAA TACAGTATAT TCAGATATAT 
AAGCTTTATT GGACATTCGT TGGGCAATTT 
CAAGGCCAAG GTTTAAATAT TACCTCAACA 
CTTTCTGGAC CTCACCTTGG TACACTCTAC 
TACAGGTCTC TGGTTTATGC AGAAATGGAA 
AGCTGACATG TCGAGATCAC TCAGACCCTC 
CTTAGTAACA AAGCAGGGCT TCATTATTTC 
ATCCCTACAG GATCGCTATG TTCCTTATC A 
GTAAAACAGC TTTAAAGGAC AAACAGTCAG 
ATCCACAACT TGCTTCGACC CGTTCTGCAA 
TCGCTATAAT GTCATCAATG CATTGCCCAA 
GGAGAGCTGC ACATATAGCT GTTCTTGATT 
TTCTTTCTGG. TTGCTGCCCT CAAATATTTC 
TAGCGACTGG ACAATTACCT CATTCAACA A 
ATTAAAATGT AGATGCTGAT AAGTTCTAAG 
ATGGAAGATA ATTTATATCA TCCATGTTTA 
TTACTTTCTA GGTAATGTGG CTGTGCAATA 
TACTTTTCTA TTACTTTTTC ATATATTTTG 
AACTTTAAGC CCATACCTGT GTCTGATTGT 
TTCTTACATC AGACTACATT ATATTAGAGA 
TGGGATTTAA AATTTCTAAT ACTGGGGGTA 
TTTTCTTTTC ACATTTTACT GTGTTTTAAC 
CTACAATATA TTTTTTGAAA TCAACTTCTG 
ATCATACAAT CAAACCAGGT AGTTCATATA 
TCTATAAAGT CATTACTGTT GCTTAAACAT 
TATTTTCTAC TGGTGATTTC AACATTATTT 
TGGAAATGTT CCTGTACATG TTGGCAGCAG 
GAATGCCCTC TGCCTTGATT TGGTTGGATT 
TGAACTTTAT GACTACATTT TCTTTTAACT 
ATGTACATAA TAATTAAATG TTGAAATTTA 
AGATAATTTT TAAATATTGT TAAAATTTAT 
ATAAAATAAT TCATGTTAAA GATGGAACAA 
TGGTGATACA GATGCAAATG TTTTTGATAT 
G ACTTTACT A A AGGTGCTGA ATAGCATTA A 
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4351 ATTCACTATT TTCCTTTTCT GTTTTACTTG TGAA AATAAA AAT6CACTAA 

4401 GGTTGGGTAG AAGTTCTGTT TGCACTCACT AATTGTGACA GACAGAGGTT 

4451 TTTGTA AGTA TTTATTGTAC AATTGATGCA TGTT.TATTTT TAGCGTTGTT 

4501 ATTGCCTCTG GTGTTAATAA ATGAACAAAT GGCTATCTGG AGGAACAGCT 

4551 AAAAAAAAAA AA 



BLAST Results 



Entry HSDJlTfllT from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 
Score = 7EE1-, P = D-Oe+OD-. identities = 14SS/14bl 



Medline entries 



20 

No Medline entry 



25 



Peptide information for frame 1 



ORF from 34 bp to 33D3 bpi peptide length: 
Category: similarity to unknown protein 
30 Classification: no clue 



ID ID 



1 MGIONLCJRSE 
51 LTKSLKGKNE 
1D1 LKST ASNDLI 

35 151 LPTNTERTEQ 

B01 EITtJMEHNLA 
E51 LGPUTELROE 
3D1 GESLTKLRSN 
351 PSSFATHPNT 

40 4D1 STSAISEIUS 

451 TTDLPKCDDT 
5D1 SDEETSSDVK 
551 TPCVISGSIS 
bOl PFSASTDIVK 

45 b51 NLCOEflDKED 

7D1 RLTKSEKINS 
751 AITKflPSSTS 
flDl VPLLASSVPY 
fl51 PGGRIDFLMS 

50 101 HSLGNLIIRS 
T51 FflflKlilKKSGS 
10D1 RYVPYHSARI 
1051 INALPNTADS 



SSKP1DKYETE 
ESNKSKVKVT 
KCFEGNPSHS 
KSPDIENVflP 
SRRSSDDCHD 
EILVDNLLPN 
LPAPSTKEYH 
BLVFETVflGfl 
SLTSINSLPS 
KKSSITLflflfl 
SSCSSKPNL1> 
SNTDVSEDRT 
(2GLVENYFGS 
EEEE£2D(2(2MV 
DYLRDGINMP 
YNFTSSISUY 
FSVEEEGGSE 
ERNtJNDTFAD 
VLTRPRFKYY 
LLtJLTCRDHS 
EMCKTALKDK 
LIGRAAHIAV 



ESSVAGLSSP 
KLMKTCIKSEN 
CKEGLDPTIC 
DflFDPLNSGN 
HGTTPSLGVR 
FESLESNGKS 
VVVSGDTIKL 
GPCNSERLFP 
DDELSPDENS 
SVVFSGNLDN 
TMCKGFflSPD 
MKKNSDVLNL 
(3SST1>ISI>TC 
flNGYYEETDY 
TVCTSGCLSF 
ESSPKPfllflA 
DGVHLIVCVH 
FDSMTDRLLD 
LNKLHTFLSL 
DPRfiTFLYKL 
(3SG(2IYSEI1I 
LDSEIFLEKF 



ELKVRPAGAS 
TKKLIKt2NSK 
GYNFDPKTYI1 
LNLC ANLSIS 
TIEIKPSNKD 
KSIEITFEKE 
PDISATYASS 
flLLMKPDYNV 
KKSVVPECHL 
ETVAIHSLNS 
KSNNST6T AI 
TfiMYSEIPTV 
AVSYSNALSP 
SALDGTINAH 
PSAPRESPCN 
FLC3AKEELKL 
GLDGNS ADLR 
EII(3YI<2IYS 
SGPHLGTLYN 
SNKAGLHYFK 
HNLLRPVLUS 
FLVAALKYFtJ 



SIWYTEGEKC 
DSVVLVGYKC 
R(3TS(3KEASC 
GKLDISfiDDS 
PFSGENITVK 
ALfiEAKCLSI 
RFSDSGVESE 
KFSLGNHCTE 
NDSKTVLNLG 
SIKDPLflFVF 
TLNSKLICLG 
ESETHLGTSD 
(3KETSEKEIS 
YTSRDELMEE 
VKYSSKSKFD 
LKLPGFflYSE 
LVKTYIELGL 
LTVSKISFIG 
SSALVNTGLU 
NVVLVGSLfiD 
KDCNLVRYNV 



55 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2bg3<* frame 1 

5 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2bg3-i frame 1 

10 Report for DKFZphtes3_2bg3 . 1 

[LENGTH! 11 01 

[MUD 1222M5-22 

15 [pi J 5-12 

[HOriOLli TREnBL:CEAF2nt,_l gene: "CCHDM • M n =• Caenorhabditis 

elegans cosmid CO^D**. 2e-38 

[FUNCAT3 IS unclassified proteins [S. cerevisiae-. YOROS^cD 

2e-Db 

20 [BLOCKS! BL0012DB 

[KIO Alpha_Beta 

[KU3 L0U_C0MPLEXITY b-72 X 

25 SE(3 DSVTEDLDAPUMGIflNLflRSESSKnDKYETEESSVAGLSSPELKVRPAGASSIWYTEGEK 

SEG 

PRD ccccccccccceeeeechhhhhhhhhhccccccccccccccceeeeccccceeeccccch 

SEt3 (3LTKSLKGKNEESNKSKVKVTKLHKTnKSENTK<LIK(3NSKDSVVLVGYKCLKSTASNI>L 

30 SEG xxxxxxxxxxxxxxx 

PRD hhhhhhccccccccceeeehhhhhhhhhcccccceeecccccceeeeeeeeccccccccc 

SE<2 IKCFEGNPSHS(3KEGLPPTICGYNFDPKTYHR(2TS(3KEASCLPTNTERTE(2KSPI>IENV(2 

SEG 

35 PRD eeeecccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SE<2 s PD(3FDPLNSGNLNLCANLSISGKLDIS(3DDSEIT(3I1EHNLASRRSSDDCHDH(3TTPSLGV 

SEG - 

PRD ccccccccccceeecccccccccccccccccccchhhhhhhcccccccccccccccccee 

40 

SE<2 RTIEIKPSNKDPFSGENITVKLGPUTELRflEEILVDNLLPNFESLESNGKSKSIEITFEK 

SEG 

PRD eeeeecccccccccccceeeccccchhhhhhhhhhhccccccccccccccceeeehhhhh 

45 SEd EALflEAKCLSIGESLTKLRSNLPAPSTKEYHVVVSGDTIKLPDISATYASSRFSDSGVES 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccccccccceeeeecccccccccccccccccccccccccc 

SE<2 EPSSFATHPNTDLVFETVdG(2GPCNSERLFP(2LLnKPDYNVKFSLGNHCTESTSAISEI(3 

50 SEG 

PRD ccccccccccceeeeeeeccccccccccccccccccccceeeeecccccccccchhhhhh 

SEfl SSLTSINSLPSDDELSPDENSKKSVVPECHLNDSKTVLNLGTTDLPKCDDTKKSSITLfifl 

SEG 

55 PRD cccccccccccccccccccccccccccccccccccceeecccccccccccccccceeecc 

SEfi <3SVVFSGNLDNETVAIHSLNSSIKDPL£FVFSDEETSSDVKSSCSSKPNLDTMCKGF(2SP 
SEG xxxxxxxxxx 
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PRD eeeeeecccccceeeeeeeccccccceeeeeccccccceeeccccccccccccccccccc 

SE(3 DKSNNSTGTAITLNSKLICLGTPCVISGSISSNTDVSEDRTMKKNSDVLNLT(3nYSEIPT 

SEG 

PRD cccccccccccccceeeeeeeccceeeeecccccccccccccccccchhhhhhheeeeec 

SEt2 VESETHLGTSDPFSASTDIVK(3GLVENYFGS(3SSTDIS1)TCAVSYSNALSP(3KETSEKEI 

SEG 

PRD cccccccccccccccceeeeeeeeeeeecccccccccceeeeeecccccccccccccccc 

SE(2 SNL(3(3E(3DKEDEEEEi2D(2l3nVt3NGYYEETDYSALDGTINAHYTSRDELI1EERLTKSEKIN 

SEG - - - xxxxxxxxxxxxxxxx 

PRD cchhhhhcccchhhhhhhhhhcccccccccccccccceeeeccchhhhhhhhhhhhhccc 

15 SEG SDYLRDGINMPTVCTSGCLSPPSAPRESPCNVKYSSKSKFDAITKfiPSSTSYNFTSSISU 

SEG xxxxxxxxxxxx- 

PRD ccccccccccccccccceeecccccccccceeeecccccceeeeGccccccceeecceee 

SE(2 YESSPKP(3I(3AFL(3AKEELKLLKLPGFHYSEVPLLASSVPYFSVEEEGGSEDGVHLIVCV 

20 SEG xxxxxxxxx xxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhccccceeeGeeeeeecccceeeeeccccccceeeeeee 

SE<3 HGLDGNSADLRLVKTYIELGLPGGRIDFLMSERN(3NDTFADFDSMTDRLLDEII(3YI(3IY 

SEG 

25 PRD eccccccchhhhhhhhhhhccccccchhhhhccccccccccccchhhhhhhhhhhhhhhh 

SE<2 SLTVSKISFIGHSLGNLIIRSVLTRPRFKYYLNKLHTFLSLSGPHLGTLYNSSALVNTGL 

SEG 

PRD hccccccccccccceeeeeeeeccccchhhhhhhhhhccccccccceeeeccccccccch 

30 

SEG UFI1(3KbJKKSGSLL(3LTCRDHSDPR(2TFLYKLSNKAGLHYFKNVVLVGSL(3DRYVPYHSAR 

SEG - 

PRD hhhhhhhhhheeeeeecccccccceeeeeeccccceeeeeeeeeeeccccccceeehhhh 

35 SEA IEnCKTALKDK(3SG(3IYSEriIHNLLRPVLt3SKDGNLVRYNVINALPNTADSLIGRAAHIA 

SEG 

PRD hhhhhhhccccccchhhhhhhhhhhccccccccceeeeeeecccccccccchhhhhhhhh 

SEd VLDSEIFLEKFFLVAALKYFd 

40 SEG 

PRD hhhhhhhhhhhhhhhhhhccc 



45 



(No Prosite data available for DKFZphtes3_2kg3 - 1) 
(No Pfam data available for DKF2phtes3_Hbg3-l) 
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5 group: signal transduction 

DKFZphtes3_2 c jf 24 encodes a novel 52b amino acid protein with 
similarity to murine netla- 

10 The closely related mNETl activates signalling pathways in 

addition to those directly controlled by activated RhoA- The 
novel protein is expressed ubiquitously- 

The new protein can find application in modulation/blocking 
15 signalling pathways- 

similarity to netla (Mus musculus) 
20 perhaps complete cds- 
Sequenced by BUFZ 

Locus: /map="72.M0 cR from top of Chr3 linkage group" 

25 

Insert length: 355T bp 

Poly A stretch at pos - 3534 -i polyadenylat ion signal at pos - 3513 



30 1 CGCCGCCGCC 

51 TGTAGTCGCC 

101 CGCCATGGTG 

151 ACTGCAGCCT 

201 GAGCCTAGTA 

35 251 A AACCTCATC 

301 CCCTGCAGCG 

351 GCCCCCCGAC 

401 AGATAGCAAG 

451 TTACATCCAA 

40 5D1 GGAGAAGAAG 

551 TGACCCCATG 

bOl TTTTTGGAAC 

t,51 CAGCTTCGAG 

701 CATCCTCGTG 

45 751 GCAATCAAGT 

601 CGAGTCCAGG 

651 ACTAGATCTC 

TDl ACCCTCTGCT 

^51 GATCAGCAGC 

50 1D01 AGAAATCAAC 

1051 GGCTTCTTTA 

1101 CGAGTCTTGT 

1151 GCATGTTTTC 

1201 ACAATGAGCA 

55 1251 GACCTCCTGC 

1301 CCTGCGAGGG 

1351 TCAGTTTCAA 

14D1 GACACTTTCA 



CGGCATCGTG GAGCTGGGGC 
TAGGGTCAGC GGTGACATCC 
GCCA AGGATT ACCCCTTCTA 
GGAGCTACCC CCGGCCAGCG 
ATAAACGGGT CAAACCCCTT 
CCGCCCGTGA AGGCCACGCC 
CTCCATTAGC TTCCGCAGTG 
CCTGGTCCAG AAATGCCGCC 
CTGTGGAGTG AGACCTTCGA 
GGAAATCAAA CGTCAGGAGG 
ACTTGATAGA AGACTTGAAA 
CTGAAACTCT CCATAATGAC 
ACTGGACTCT CTAATTCCTC 
ATGTTAGGAA GCCTGATGGC 
GGCTGGCTCC CTTGCCTCAG 
AGCCGCCAAA GCTCTGCTGG 
ATTTCCTACA GCGATGTTTA 
TGGAATTTCC TCGATATTCC 
TCTCCGAGAA ATCTTGAGGC 
ACTTGGA AGA AGCTATAAAT 
ACCAAGACTG GTGAATCTGA 
CTTGGAAGAA GGCCAGAAAG 
GTTGTCATGG TGAACTGAAG 
CTGTTCCAAG AAGTGCTTGT 
GCTTTGCTAC CAGCTGTACC 
TGGA AGACCT CCAGGATGGA 
GCATTCAGCA ACAATGAGAG 
AAATGGATCC CAAAGTCAGA 
ACAAACAGCA GTGGCTTAAC 



CCCCTTTTGC CTGGGAGTTT 
CAAAGGGCAG GCCCGGCAGC 
CCTCACGGTC AAGAGAGCGA 
GTCCGGCCAA GGACGCTGAG 
TCCCGAGTCA CGTCGCTAGC 
ATTAAAGCGC TTCAGTCAAA 
AGAGCCGCCC TGACATCCTC 
CCCTCGAGCA CGAAACGGAG 
TGTGTGCGTC AATCAGATGC 
CGATCTTTGA GCTTTCCCAA 
TTAGCAAAAA AGGCCTATCA 
AGAACAAGAG TTGAATCAAA 
TACATGAAGA GCTCCTTAGT 
TCGACTGAAC ATGTTGGTCC 
CTCCTATGAT AGCTACTGCA 
ACCACAA AAA GCAAGATCAC 
GAATCCCCCT TTAGCCGCAA 
AAGAAGCCGC CTGGTAAAAT 
ACACACCAAA TGATAATCCA 
ATCATTCAGG GAATTGTGGC 
ATGCCGCTAT TATAAAGAGC 
ACTCCCTGAT CGACAGCTCT 
AACAATCGGG GCGTGAAACT 
GATCACTCGA GCCGTCACCC 
GTCAGCCAAT CCCCGTGAAA 
GAAGTGAGGC TGGGTGGCTC 
AATTAAAAAC TTCTTCAGAG 
CCCACTCGCT ACAAGCCA AT 
TGTATTCGTC AAGCCAA AGA 
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mSl A ACAGTTTTG TGTGCTGCCG GGCAAGCTGG GGTGCTTGAC TCCGAGGGAT 
1501 CGTTCCTAAA TCCCACCACC GGGAGCAGAG AGCTACAGGG AGAAACAAA A 
1551 CTTGAGCAGA TGGACCAATC GGACAGTGAG TCAGACTGTA GTATGGACAC 
ItOl GAGTGAGGTC AGCCTCGACT GTGAGCGCAT GGAACAGACA GACTCTTCCT 
5 lb51 GTGGAAACAG CAGGC ACGGT GAAAGTAACG TCTGACAGAA GCATGTGCAC 
17D1 TTCGGGAAGC AGGCCTGCAT CTTACCTGTA CAGTATTTGC ATTCCACAGA 
1751 TGGAACGGTT TGGAGAAGCA CTTTTTCATA CTTTTGTGAA AGTATACATG 
1601 TTGGCCCAGT CTCTCGTATC TGTACCTTTG TCCCTAGTAC TGTAACTGCC 
1A51 AATCTGTCTG TGTA AGCTGG AATCTGTGGC AACTATTACC CTGTGTTGTA 

10 nOl TTTCCCAAGT GTCTGGATGG ATGGAGAGGT ACTCAAACAA GTTACTTTCA 
1=151 GTTGTCCTGC TGGATTTTAA AAAAATAGAA AAAGAATCTC AAAACTACTG 
20D1 TTTTACATAG ATTGTTTGAA GAGTCCTTCC TCTTGTGCTT CTGTACCACT 
2D51 TTCCCAGCTC TTAGATGTGG TAGCTAAAGG CACGGAATTT AGACGGCCTT 
2101 GTAAATAGGG CATGAGGAAC TCATCTGTGT ATTGGGATGG TATTAGAGAG 

15 2151 AGAATCAGGA AAGACCAACT CATGAAGTGA ACTTGGTTTG ATCTTACTCA 
2201 ACTAGAAAGC TTGAAA ACAT CCCTGGGGAT TCTGAAGGCT TAATTTTGCA 
2251 AAGGAGGATG CATTGTCTGA ACTTTGCAAC TTCATCCAGT GCAAGTTTGA 
2301 TGCAAGAATG TATTAGGACA T AAAATAG AG GCTGACCTTA AAAGGGCCAG 
2351 GACAGAAGCG GCTGCCAGCT CTGAATCTTT AACTGAAATG CACATGGCAC 

20 2MD1 CAGGAGGTGT CTCTCATAGT TGGTTGCTAG CCTAAAACAT CAGAATAGAA 
2M51 CCCAAAGGGC TTAGGA AGGC CTGCCAGGAT AACAAGAAGG CCCTGTATTC 
2501 ATTGTGTTTC ATCTGCCTAG GCCTACTCAT TATTTTAGAG AATGAATGAA 
2551 GCAACAAGGA AGAGAGACCA TGACTCTATC GATGACACTG TTTATAGAAA 
2L.01 CACAGGAGAG GAAGAATTTG GAATGAAAAG CACTTCGTCA GAACCTTCTG 

25 2b51 TGGGAGCCAT TGAGAGAAAA GCATGGTCCA GTGCCTTCTG AGAAAGGCCA 
2701 GAGCTTTGGG CTTTCCTGCT CTGCTTTTGG GTCGTCAATT TGCCATCTCT 
2751 GGTTCTGTGC TATAATCAGA ATTGTAATTA TGTTCTCCAG AGGCCAATTT 
2A01 CATTAACTCT GATTAATTAG AATCAGCTAG CCAGATTAGT AACCTCTTTG 
2B51 TCCAGCCTTG ATTTACAGTG CAGGGTAAAG TGCAGACCTT AAAAACAGCT 

30 2^01 AAGTACCTAG AAGAGCTCCC TGCAAGTGTA AATATTAAGG ATGACCTGTG 
2=151 CAAAATTATA CCCACACCAG CACTAGTGGT AATTATTCTA AATTATTGCC 
3001 AAAAAGTTTT TTTT AATCTG TCTTTC A AGT TTACAGAAAA GAAAGCAGTA 
3051 AATGCATTGA TGTCATTTTA TTATGTACAT ATATCATGTG CATTCAAGCT 
3101 GTGTGACAAG ATATATCAAT ATAAAAACAA GGTATATACT TTATTATTTT 

35 3 151 -T-T G A A A A GA A GGAT ATTGTG ATCAATTTTA- CCCTGTAAAA CATATTTCT.G 

3201 TATTTATAGG TCTTA A ACAT GATGAATTTT TTCTATTACA AGTTTATTTA 
3251 AAACTGCTTT CTCAAGTCGT TATTGATACA GCAAGTGAAC CTGCTGCAGA 
33D1 CAGAAGCAGA GGAAAGCCA A GAACAGCCTT TATTGGTGAA GAAAAGAATG 
3351 AATGATTCTT TGTAGGCGCC ATCAGCCACT TTTAGAAGCC ATCAGCCAGT 

40 3M01 GTGTTGGGAA AAGAGGTTTG TCAAGTGTTG GCCTATGGGA AGGTGGTCAA 
3M51 TGAATGTTTT GATGAAATGA ATGTTTTTGT ATAATGGCCT TAAACTTTTC 
3501 TGGAAGTATT TCAAATAAAT TACATTATTA AGTCAAA AAA AAAAAAAAAA 
3551 AAAAAAAAA 

45 

BLAST Results 
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No BLAST result 

Medline entries 



55 ^fi33LnL>: 

Alberts ASt Treisman R-n Activation of RhoA and SAPK/JNK 

signalling 

pathways by the 
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RhoA-specif ic exchange factor mNETl • EI1B0 J l^a Jul 
ISM? CIM ) :MD75-aS 



Peptide information for frame 3 



10 



15 



20 



25 



ORF from IDS bp to IboE bp=i peptide length: 52b 
Category: strong similarity to known protein 
Classification : Cell signaling /communication 



1 fIVAKDYPFYL 

51 LIPPVKATPL 

1D1 SKLWSETFDV 

151 PNLKLSIHTE 

EDI LVGULPCLSS 

E51 DLWNFLDIPR 

301 INTKTGESEC 

351 VFLFtfEVLVI 

M01 RGAFSNNERI 

M51 VLCAAGflAGV 

501 EVSLDCERHE 



TVKRANCSLE 
KRFSGTLGRS 
CVNflMLTSKE 
flELNfllFGTL 
YDSYCSNC2VA 
SRLVKYPLLL 
RYYKERLLYL 
TRAVTHNEdL 
KNFFRVSFKN 
LDSEGSFLNP 
(2TDSSCGNSR 



LPPASGPAKD 
ISFRSESRPD 
IKRtfEAIFEL 
DSLIPLHEEL 
AKALLDHKK<2 
REILRHTPND 
EEGOKDSLID 
CY(2LYR(2PIP 
GSt2S<2THSL(2 
TTGSRELflGE 
HGESNV 



AEEPSNKRVK 
ILAPRPUSRN 
SflGEEDLIED 
LSflLRDVRKP 
DHRVdDFLflR 
NPD(3(3HLEEA 
SSRVLCCHGE 
VKDLLLEDL<2 
ANDTFNKfc2(2ll 
TKLEflMDdSD 



PLSRVTSLAN 
AAPSSTKRRD 
LKLAKK AYHD 
DGSTEH VGPI 
CLESPFSRKL 
INIIC3GIVAE 
LKNNRG VKLH 
DGEVRLGGSL 
LNCIRC2AKET 
SESDCSNDTS 
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35 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_E c Jf EM -. frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_E^f EM frame 3 



40 



45 



50 



55 



Report for DKFZphtesS^ETf EM . 3 



ELENGThO 5b0 
EMliO fe,3EDE.fiS 
EpI] b • DM 

CH0H0L3 TREMBL : AF0^M5E0_1 gene: "Netl n i product: n NETl 

homology Nus musculus NET1 homolog (Netl) mRNA-i complete cds- 
le-lb2 

CFUNCAT3 OT-Ol biogenesis of cell wall IS* cerevisiaei 

YLR3?lwJ 3e-lb 

CFUNCAT3 03.07 pheromone response-i mating-type determination-! 
sex-specific proteins ICS- cerevisiae-i YLR371w3 3e-lb 
CFUNCAT3 1D-0E-0T regulation of g-protein activity ES- 
cerevisiaei YLR371w3 3e-lb 

CFUNCAT3 O^.OM biogenesis of cytoskeleton ES- cerevisiae-i 
YLR371w3 3e-lb 

EFUNCAT3 03-OM budding-i cell polarity and filament formation 
ES. cerevisiaei YLR371wJ 3e-lb 
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WO 01/98454 



PCT/IB01/02050 



EFUNCATID 01-05. DM regulation of carbohydrate utilization CS- 
cerevisiae-i YLR371w3 3e-lb 

EFUNCAT3 3D-03 organization of cytoplasm ES. cerevisiae-i 

YALOmwJ 3e-ll 

5 EFUNCATID 03-52 cell cycle control and mitosis IS- cerevisiaei 

YALOmwID 3e-ll 

EFUNCATJ 1D.05-OT regulation of g-protein activity ES- 
cerevisiae-i YALOMlwl 3e-ll 

EBLOCKSJ PR00510E 

10 EBL0CKSID PROODMIE 

EBL0CKS2 BL007M1B 

EPIRKliU breakpoint cluster region le-Db 

EPIRKliD transmembrane protein 5e-13 

EPIRKLO brain 3e-0b 

15 EPIRKU3 signal transduction Se-13 

EPIRKliO alternative splicing le-Db 

ESUPFAMU CDC2H homology ^e-15 

ESUPFAMU SH5 homology le-ll 

ESUPFAnm CDC25-type guanine nucleotide exchange activator 

20 homology Se-Ofi 

ESUPFAN3 dbl transforming protein Te-Ofl 

ESUPFAH3 protein kinase C zinc-binding repeat homology le-ll 

ESUPFAMD SH3 homology le-ll 

ESUPFAMU bcr protein le-Db 

25 ESUPFAM3 pleckstrin repeat homology 2e-ll 

ESUPFAMJ vav transforming protein le-ll 

EKLO All_Alpha 

30 SE(2 PPPGIVELGPPFAUEFCSRLGSAVTStfRAGPAAANVAKDYPFYLTVKRANCSLELPPASG 
PRD cccceeeeccccccchhhhhhhhhhhhhcccccccccccccceeeecccccccccccccc 

SE(2 PAKDAEEPSNKRVKPLSRVTSLANLIPPVKATPLKRFS(3TL(3RSISFRSESRPI>ILAPRP 
PRD cccccccccccccccccccccccccccccccccccchhhhhhcccccccccccccccccc 



35 



50 



SE<2 USRNAAPSSTKRRDSKLUSETFDVCVNt3nLTSKEIKR(3EAIFELSt3GEEl>LIEI>LKCLAKK 
PRD cccccccccchhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 



SE(2 AYHDPI1LKLSiriTE(2ELN(3IFGTL]>SLIPLHEELLSflLRDVRKPDGSTEHVGPILV6ULP 

40 PRD hhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccceeeeeccc 

SEd CLSSYDSYCSNt2VAAKALLI)HKKt3DHRV(3I>FL(3RCLESPFSRKLDLbINFLI>IPRSRLVKY 

PRD cccceeecccchhhhhhhhhhhhcchhhhhhhhhhhcccccccccccceeeccccccchh 

45 SEt3 PLLLREILRHTPNDNPD(3t3HLEEAINII(2GIVAEINTKTGESECRYYKERLLYLEEGt2K:D 

PRD hhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcccc 



SEf2 SLIDSSRVLCCHGELKNNRGVKLHVFLF(3EVLVITRAVTHNEt3LCY(3LYR(3PIPVKDLLL 

PRD hhhhhhheeecccccccccccceeeeehhhhhhhhhhhchhhhhhhhhhhhccccccccc 

SE(3 EDL(2DGEVRLGGSLRGAFSNNERIKNFFRVSFKNGS(3S(3THSL(3ANDTFNK(3(2lilLNCIR(2 

PRD ccccccccccccccchhhhhhhhhhhhheeeeccccchhhhhhhhcccchhhhhhhhhhh 



SEt3 AKETVLCAAGd3AGVLDSEGSFLNPTTGSRELC3GETKLEt3nDt2SDSESDCSi1DTSEVSLDC 
55 PRD hhhhhhhhhccceeeeccccccccccccchhhhhhhhhhhhhhccccccccccccccccc 



SE<3 ERFIEdTDSSCGNSRHGESNV 
PRD cccccccccccccccccccc 
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WO 01/98454 PCT/1B0 1/02050 

(No Prosite data available for DKFZphtes3_2^f 24 -3) 
(No Pfam data available for DKFZphtes3_2Tf 24 - 3) 
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WO 01/98454 

DKFZpht es3_30pb 



PCT/IB01/0205U 



5 group: testis derived 

DKFZphtes3_ 3Dpb encodes a novel Mbl amino acid protein without 
similarity to known proteins- 

10 No informative BLAST results=i No predictive prositen pfam or SCOP 
motif e. 



15 



20 



The new protein can find application in studying the expression 
profile of testis-specif ic genes* 

similarity to C-elegans FmH10-H 

perhaps complete cds. 

Sequenced by LMU 

Locus: unknown 

25 Insert length: im** bp 

Poly A stretch at pos- ITll-i no polyadenylation signal found 

1 GGAACAGACC ACTGGGCTGG CAGCTGAGTT GCAGCAGCAG CAGGCTGAGT 

30 51 ACGAGGACCT TATGGGACAG AAAGATGACC TCAACTCCCA GCTCCAGGAG 

101 TCATTACGGG CCA ATAGTCG ACTGCTGGAA CAACTTCAAG AAATAGGGCA 

151 GGAGAAGGAG CAGTTGACCC AGGA ATTACA GGAGGCTCGG A AGAGTGCGG 

EDI AGAAGCGGAA GGCCATGCTG GATGAGCTAG CAATGGAAAC GCTGCAAGAG 

E51 AAGTCCCAGC ACAAGGAAGA GCTGGGAGCA GTTCGTCTAC GGCATGAGAA 

35 3D1 GGAGGTGCTG GGGGTGCGTG CCCGCTATGA GCGTGAGCTC CGAGAGCTGC 

351 ATGAAGACAA GAAGCGTCAG GAGGAGGAGC TCCGTGGGCA GATCCGGGAG 

M01 GAGAAGGCCC GGACACGGGA GCTGGAGACT CTCCAGCAGA CAGTGGAAGA 

M51 ACTTCAAGCT CAGGTACATT CCATGGATGG AGCCAAGGGC TGGTTTGAAC 

5D1 GGCGCTTGAA GGA AGCCGAG GAATCCCTGC AGCAGCAGCA GCAGGAACAA 

40 551 GAGGAAGCCC TCAAGCAGTG TCGGGAGCAG CACGCTGCCG AGCTGAAGGG 

bDl CAAGGAGGAG GAGCTACAGG ATGTACGGGA TCAGCTCGAG CAGGCCCAGG 

b51 AGGAGCGGGA CTGCCACCTG AAGACCATTA GCAGCCTGAA GCAGGAGGTG 

701 AAGGACACAG TGGATGGGCA GAGGATCCTG GAGAAGAAGG GCAGTGCTGC 

751 GCTCAAGGAC CTCAAGCGGC AGCTGCATTT GGAGCGGAAA CGGGCAGATA 

45 601 AGCTGCAGGA GCGACTGCAG GACATCCTCA CTAACAGCAA GAGCCGCTCA 

fi51 GGCCTTGAGG AGCTGGTTCT CTCAGAGATG AACTCACCAA GCCGGACCCA 

101 GACAGGGGAC AGCAGT AGCA TCTCCTCCTT CAGCTACCGG GAGATCTTGC 

151 GGGAAAAGGA GAGCTCGGCT GTTCCAGCCA GGTCCTTATC CAGCAGCCCT 

1001 CAAGCCCAGC CCCCTCGGCC AGCAGAGCTG TCAGATGAGG AAGTGGCTGA 

50 . 1051 GCTCTTTCAG CGGCTGGCAG AGACACAGCA GGAGAAATGG ATGCTGGAGG 

1101 AGAAGGTGAA GCACCTGGAA GTGAGCAGTG CTTCCATGGC AGAGGACCTC 

1151 TGCCGGAAGA GCGCCATCAT TGAGACCTAC GTCATGGACA GCCGGATCGA 

1201 TGTGTCTGTG GCAGCAGGCC ACACAGACCG CAGCGGGCTG GGCAGCGTCC 

1551 TGAGAGACCT AGTGAAGCCA GGCGACGAGA ACCTTCGGGA GATGA ACAAG 

55 1301 AAGCTGCAGA ACATGCTGGA GGAGCAGCTC ACCAAGAATA TGCACTTGCA 

1351 CAAGGATATG GAAGTTCTGT CCCAGGAAAT TGTGCGGCTC AGCAAGGAGT 

mQl GCGTGGGGCC TCCTGACCCA GACCTAGAGC CAGGAGAAAC CAGCTAAAGA 

m£l CCTGCAGGCT GCACCCACCT. CCTCCCCTTC CTACCCCCTA GGATGCTATT 
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WO 01/98454 

15D1 CCCTTGGGCT 
1551 TGGGAGACTG 
IbOl ATCCTGTCTT 
lt51 GGCAGGGATT 
17D1 TCATCTCTGC 
1751 TTTTATTTTT 
1601 GGGGATGCTG 
1651 GTGTGTGTGT 
nOl CGGGCCCACC 



GTGGTGGAAA 
GACATTAAAG 
AGGGCAGAGG 
TCTCCTTCTT 
ATGAGCTCTC 
TAATTTATGT 
GGTGGGTGTG 
GTGTGTAAAG 
CACAAAAAAA 



A ATGAGGGCT 
GGGCT AGAGG 
CCACCAGGGA 
CTTGGTCCTG 
CTTCCC AGAG 
CTGGAGCCTG 
TGGTCCATGT 
GCTATGCAGC 
AAA AAAAAAA 



GGAGCCAAAA 
CCTGATGGTT 
GTGGGGATCC 
GCTCCCAAGG 
ACCAACTCTT 
GCTACTCTGC 
TCAGCGTTCT 
CAAAATACCA 
AAAAAA AAAA 



PCT/IB01/02050 

TCAAATAGCT 
AGTGTTAATG 
TGAGGGAAGG 
GCTTCTGTCT 
TTTTATTTTA 
ATTTGGGATT 
AGCAACACGT 
TCTGGCCAGA 
AAAG 



BLAST Results 



15 No BLAST result 



Nedline entries 



No Medline entry 



25 



Peptide information for frame 2 



30 



35 



40 



ORF from feE bp to 14MM bp; peptide length: 
Category: similarity to unknown protein 
Classification: no clue 



Mbl 



1 MG<2KDDLNS<2 
51 AMLDELAMET 
101 KR(2EEELRGC2 
151 EAEESLOC2c2(2 
201 CHLKTISSLK 
251 RLdDILTNSK 
301 SSAVPARSLS 
351 HLEVSSASMA 
MD1 VKPGDENLRE 
M51 PDPDLEPGET 



LtfESLRANSR 
LdEKSdHKEE 
IREEKARTRE 
<2EGEEALK<3C 
(3EVKDTVDG(3 
SRSGLEELVL 
SSPdAGPPRP 
EDLCRKSAII 
MNKKL(3Nf1LE 
S 



LLEc2Lt2EIG(2 
LGA VRLRHEK 
LETL<3<3TVEE 
RE(3H AAELKG 
RILEKKGS A A 
SEflNSPSRT<2 
AELSDEEVAE 
ETYVHDSRID 
EdLTKNMHLH 



EKE(3LT<2EL(2 
EVLGVRARYE 
LtfAflVHSflDG 
KEEELflDVRD 
LKDLKRdLHL 
TGDSSSISSF 
LF<2RLAETl3(2 
VSVAAGHTDR 
KDMEVLStfEI 



EARKSAEKRK 
RELRELHEDK 
AKGUFERRLK 
(3LE(2 AC2EERD 
ERKRADKLGE 
SYREILREKE 
EKUNLEEKVK 
SGLGSVLRDL 
VRLSKECVGP 
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55 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30pb-« frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes 3__3 □ p b n frame 2 

Report for DKFZphtes3_3Dpb • 2 
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PCT/1B01/02050 



ELENGThO Mfll 
ElllO 5S3ia.lD 
EpI3 5.07 

EH0N0L3 TREHBL s CEFmH10_M gene: "FmHlO-M", Caenorhabditis 

5 elegans cosmid FHIHIO - 2e-lH 

EFUNCAT3 30-03 organization of cytoplasm IS- cerevisiae-i 
YDLOSflw J Se-OM 

EFUNCATJ Ofl-0? vesicular transport (golgi network-! etc-) ES- 
cerevisiae-i YDL05flwJ 5e-04 
10 EBL0CKSJ BLD1100D NNnT/PNIIT/TEriT family of methyl transferases 
proteins 

EKIiO All_Alpha 

EKbU L0U_C0MPLEXITY n-13 '/. 

EKIiO C0ILED_C0IL MO-'Jb X 

15 

SE<2 Et3TTGLAAELc3tJ(3(3AEYEDLnG(3KDDLNS(3L(3ESLRANSRLLE(3L(3EIGt3EKE(2LT(3EL(3 

SEG xxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
20 COILS 

. - -ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SE<2 EARKSAEKRKAMLDELAMETLCEKSflHKEELGAVRLRHEKEVLGVRARYERELRELHEDK 

SEG x 

25 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

cccccccc 

SE<2 KRQEEELRG(2IREEKARTRELETL(2(2TVEEL(2A(2VHSriDGAKGIilFERRLKEAEESL(2t3tl<3 

30 SEG xxxxxxxxxxxxxxxx xxxxxxxxxxx 

PR1> hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCC 

SEU OEOEEALK(2CRE(JHAAELKGKEEELCDVRDi3LEC}AflEER])CHLKTISSLKt3EVKDTVDG(3 

SEG xxxxxxxxx .... 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
COILS 

CCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEC RILEKKGSAALKDLKROLHLERKRADKLflERLfiDiLTNSKSRSGLEELVLSEMNSPSRTfl 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhccccccc 
COILS 

45 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SE<2 TGDSSSISSFSYREILREKESSAVPARSLSSSPflAflPPRPAELSDEEVAELFflRLAETflfl 

SEG - - 'XXXXXXXX* • xxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhhhhh 
50 COILS 



SE<3 EKUflLEEKVKHLEVSSASHAEDLCRKSAIIETYVflDSRIDVSVAAGHTDRSGLGSVLRDL 

SEG 

55 PRD hhhhhhhhhhhhhhchhhhhhhhhhhhhhhcccccchhhhhhhccccccccccccccccc 
COILS 
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SEG VKPGDENLREnNKKLGNnLEEGLTKNIIHLHKDriEVLSGEIVRLSKECVGPPDPDLEPGET 

SEG 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccc 
COILS 

5 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEG S 
SEG - 
PRD c 
10 COILS 



(No Prosite data available for DKFZphtes3_3Dpfc, • 2 ) 
15 (No Pfam data available for DKFZphtes3_30pb -2) 
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WO 01/98454 
DKFZphtes3_31alO 



PCT/1B01/02050 



5 group: nucleic acid management 

DKFZphtes3_31alO encodes a novel 5H2 amino acid protein with 
similarity to histone HI of Drosophila hydei. 

10 Histone HI variants are known to act as specific regulators of 
genes via the differential condensation of DNA • 



15 



20 



The new protein can find application in modulating/blocking the 
transcriptional activity and in expression profiling. 

weak similarity to Drosophila histone HI 

perhaps complete cds- 

Sequenced by LflU 

Locus: /map="13" 

25 Insert length: 2667 bp 

Poly A stretch at pos- 2fl55n polyadenylation signal at pos. 263^ 

1 AGATGATCCC CAAAGTCAAC ATATGACATT AAGCCAGGCA TTTCACCTTA 

30 51 AAAACA AT AG TAA A AAGAAA CAAATGACTA CAGAAAAACA AAAGCAAGAT 

101 GCTAACATGC CCAAGAAACC TGTGCTTGGA TCTTATCGTG GCCAGATTGT 

151 TCAGTCTA AG ATTAATTCAT TTAGAAAACC TCTACAAGTC A A AGATGAGA 

201 GTTCTGCAGC A ACA AAGAAA CTTTCAGCCA CTATACCTAA AGCCACAAAA 

251 CCTCAGCCTG TAAACACCAG CAGTGTAACA GTGAAAAGTA ATAGATCCTC 

35 3D1- CAATATGACT- GCCACTACTA A A T T T G T G A G— C A C T A C A TC T C A G A A C A C A C 

351 AACTTGTGCG ACCTCCTATT AGAAGTCATC ACAGTAATAC CCGGGACACT 

MEJ1 GTGAAACAAG GCATCAGTAG AACCTCTGCC AATGTTACAA TCCGGAAAGG 

i*51 GCCTCATGA A AAAGAACTAT TACAATCAA A AACAGCTTTA TCTAGTGTCA 

5D1 AAACCAGTTC TTCTCAAGGT ATAATAAGAA ATAAGACTCT ATCAAGATCC 

40 551 ATAGCATCTG AAGTTGTAGC CAGGCCTGCT TCATTGTCTA ATGATAAACT 

bOl GATGGAAAAG TCAGAGCCCG TTGACCAGCG AAGACATACT GCAGGAAAAG 

b51 CAATTGTTGA TAGTAGATCA GCTCAGCCCA AAGAAACCTC GGAAGAGAGA 

7D1 AAAGCTCGTC TGAGTGAGTG GAAAGCTGGC AAAGGAAGAG TGCTAAAAAG 

751 GCCCCCTAAT TCAGTAGTTA CTCAGCATGA GCCTGCAGGA CAAAATGAAA 

45 601 AACTAGTTGG GTCTTTTTGG ACTACCATGG CAGAAGAAGA TGAACAAAGA 

651 TTATTTACTG AAAAAGTAAA CAACACATTT TCTGAATGCC TGAACTTGAT 

^□1 TAATGAGGGA TGTCCAAAAG AAGATATACT GGTCACACTG AATGACCTGA 

^51 TTAAAAATAT TCCAGATGCC AAAAAGCTTG TTAAGTATTG GATATGTCTT 

1D01 GCACTTATTG AACCAATCAC AAGTCCTATT GAAAATATTA TTGCAATCTA 

50 1D51 TGAGAAAGCC ATTCTGGCAG GGGCTCAGCC TATTGAAGAG ATGCGACACA 

11Q1 CGATTGTAGA TATTCTAACA ATGAAGAGTC A AGAAAAAGC TAATTTAGGA 

1151 GAAAATATGG AGAAGTCTTG TGCAAGCAAG GAAGAAGTCA AAGAAGTCAG 

12D1 TATTGAAGAT ACAGGTGTTG ATGTAGATCC AGAAAAACTG GAAATGGAGA 

1251 GTAAACTTCA TAGAAATTTG CTATTTCAAG ATTGTGAAAA AGAGCAAGAC 

55 1301 AACAAAACAA AAGATCCA AC CCATGATGTT AAAACCCCCA ATACAGAAAC 

1351 GAGGACAAGT TGCTTAATTA AATATAATGT GTCTACTACG CCATACTTGC 

mOl AAAGTGTGAA AAAAAAGGTG CAGTTTGATG GAACAAATTC CGCATTTA A A 

1451 GAGCTGAAGT TTTT AACACC AGTGAGACGT TCTCGACGTC TTCA AGAGA A 
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1SD1 A ACTTCTAAA TTGCCAGAT A 
1551 CATTGGAACA GCTAACGGAG 
lbDl CGCCCTAATG CAGCACTGTG 
ItSl AGAG A A AT AA AGCTCTGTTA 
5 1701 TTTTGTTTTG AGTAGCTTTA 
1751 ACCT ATGTAT CCTAAGCATT 
1601 TGTTATGGCA AGAGTTGTCC 
1351 AGTTTCAACC AACTGAGTTT 
MDl AGTTTACTAT GTTCCTTGAA 

10 1T51 CTTTACTAAA TATAAGTACA 
20D1 TCTAGGTAAA ATGTATGTTT 
2051 ACCTCCCTGC CTTTAAACAG 
2101 AGATTAGTCA AAAATTCTAT 
5151 GGAGGTTTAG CCTGCTTTCT 

15 22D1 CTCTTGCGTC CCTTGGACTG 
2251 ACACTTTTCG TCAGTAGTCT 
2301 GGGGTCACCA AGAAGGTTTA 
2351 ATACTTTGTG TAAGAAAAGA 
2M01 TTCACTTGAT AGTTTTTAAG 

20 2MS1 TCATACTGAA CAAATGTCAT 
2501 GATACTAATA CTTGTTTTCT 
2551 GTCATGTCCC TTGAAACATG 
2L.01 ATAAATAACA CCACTAAAGT 
2b51 TATATCAACT CTACAGTTTC 

25 27D1 AGTTGAAAAA CTGTATGAAT 
2751 TTCATTCCAC TTTGTATATC 
2fiDl AGAGTAGGAC TTTTATTCCG 
2fi51 TGTATAATTA AAAAAAAAAA 



TGTTAA AAGA 
TTGGGA AG AG 
CCGGGTGTAC 
GGG AATGGGG 
TATTGCTCTT 
CACGGCAGTG 
TCTACATTGG 
TTTCTTTAAG 
TATAAACAGG 
GTAATG ATGC 
GCCTTGAC AT 
AATACTTTTT 
AGA ATGACTC 
TACCAAATTC 
CCTGTTGATT 
GTAGTTTCGT 
CTTAATTAAA 
TGCCACATTT 
CAATTAGAAT 
TCTAGTTTAG 
TCCCTATAAC 
ATAGTTACAT 
TGTTTTGTAA 
AAATAAATGA 
GTGAAGATCA 
TTTTCTATTT 
TGTACCTGAT 
AAAAAAAAAA 



TCATTATCCT 
AAACTGATGC 
TATGAGGCTG 
TTTTTATTAT 
AGGTCTGGAG 
AGCTCCTTTA 
AAAGCTAATC 
AAAGGTAAAT 
TTATAATACT 
ATAATTAGAA 
GTTTTTAAAA 

ACTTCGAATA 
ATGTTACCCA 
GATGGA AAGT 
GGCCTCTTTT 
TACCGCATTT 
AGTGGTTTAA 
GGAGTTAGGG 
ATAGCATTTC 
ATAAAAAACT 
ACACAGTTTT 
GGTTCCAAAC 
CTTTTTAATT 
CATGCTTAGT 
ATTGACTTCT 
ATATATACAA 
AAAAAAG 



TGTGTGTCTT 
TTTTGTATGC 
ATACAACATA 
TTGTGGGGTG 
TTGGCCATGT 
CTAACATTCA 
CTACCTTGTC 
TTTGTCAGCT 
ACCCTGTTCA 
AATGAGGTAT 
GTTATGATGT 
GGCCTTTCTC 
CTAAGACACA 
GACTTGTGTT 
GTCTGCACTG 
GATTATAACT 
CTAAGAGAAG 
CTTTTGTAAC 
AAAGAACATA 
TAAGATAACT 
TCACTGTTAA 
CTCTCCACAC 
TAATATGGCA 
GTAAAAGATT 
CATTTTTATG 
CATGTTCTAG 
TTAAAATATC 
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BLAST Results 



35 



No BLAST result 

Medline entries 

40 No Medline entry 



Peptide information for frame 2 
45 

ORF from 23 bp to 1>±HE> bp=i peptide length: 5M2 
Category: similarity to known protein 
Classification: unclassified 

50 

1 MTLSC3AFHLK NNSKKKCJMTT EK<3K<2DANMP 
51 RKPLflVOES SAATKKLSAT IPKATKPflPV 
101 FVSTTS<2NT(3 LVRPPIRSHH SNTRDTVKflG 
151 (2SKTALSSVK TSSSflGIIRN KTLSRSIASE 
55 201 DflRRHTAGKA IVDSRSAC3PK ETSEERKARL 

251 <2HEPAGt2NEK LVGSFWTTMA EEDEtfRLFTE 
3D1 PILVTLNDLI KNIPDAKKLV KYUICLALIE 
351 AflPIEEMRHT IVDILTMKSfl EKANLGENME 
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KKPVLGSYRG 
NTSSVTVKSN 
ISRTSANVTI 
VVARPASLSN 
SEUKAGKGRV 
KVNNTFSECL 
PITSPIENII 
KSCASKEEVK 



<3IV(2SKINSF 
RSSNMTATTK 
RKGPHEKELL 
DKLMEKSEPV 
LKRPPNSVVT 
NLINEGCPKE 
AIYEKAILAG 
EVSIEDTGVD 



WO 01/98454 PCT/IB01/020S0 

401 VDPEKLEMES KLHRNLLFOD CEKEODNKTK DPTHDVKTPN TETRTSCLIK 

MSI YNVSTTPYLfi SVKKKVflFDG TNSAFKELKF LTPVRRSRRL (2EKTSKLPDM 

SOI LKDHYPCVSS LEfiLTELGRE TDAFVCRPNA ALCRVYYEAD TT 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_31.al.0-. frame 5 
No Alert BLASTP hits found 
15 Pedant information for DKFZphtes3_31al0n frame. 2 
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55 



Report for DKFZphtes3_31al0.2 



ELEN6THJ SMI 

EMIO blb??-3fc 

EpIJ T.33 

EKIO Alpha_Beta 

25 CKUJ L0U_C0MPLEXITY S-IT '/. 

SEfl DPP(3Si2HnTLS(3AFHLKNNSKKK(2I1TTEK(3K£!D ANHPKKPVLGSYRG(JIV(3SKINSFRKP 

SEG xxxxxxxxxxxx 

30 PRD ccccccchhhhheeeeccccccchhhhhhhhhccccccccccccccceeeeccccccccc 

SE<2 Li2VKDESSAATKKLSATIPKATKP<3PVNTSSVTVKSNRSSNf1TATTKFVSTTSf2NTc2LVR 
SEG ... 

PRD cccccchhhhhhhhhhccccccccccccceeeeeccccccccccceeeeeccccceeeec 

35 ... . 

SE<3 PPIRSHHSNTRDTVK(3GISRTSANVTIRKGPHEKELL(2SKTALSSVKTSSS<2GIIRNKTL 
SEG 

PRD cccccccccccccccccccccceeeeeccccchhhhhhhhhhcccccccccceeecccch 

40 SEC SRSIASEVVARPASLSNDKLPIEKSEPVDflRRHTAGKAIVDSRSAGPKETSEERKARLSElj 
SEG ... 

PRD hhhhhheeeecccccchhhhhhhcccchhhhhhcceeecccccccccchhhhhhhhhhhh 

SEfl KAGKGRVLKRPPNSVVTflHEPAGflNEKLVGSFIilTTriAEEDEflRLFTEKVNNTFSECLNLI 
45 SEG 

PRD hcccceeeeccccceeeeccccccceeeeeecchhhhhhhhhhhhhhhhccccccceeec 

SEfl NEGCPKEDILVTLNDLIKNIPDAKKLVKYUICLALIEPITSPIENIIAIYEKAILAGAflP 
SEG 

50 PRD ccccccceeeeecccceeecccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhcchh 

SEfl IEEMRHTIVDILTnKSflEKANLGENMEKSCASKEEVKEVSIEDTGVDVDPEKLEMESKLH 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhccchhhhhcccccceeGeeecccccccccchhhhhhhhh 



SEfl RNLLFflDCEKEflDNKTKDPTHDVKTPNTETRTSCLIKYNVSTTPYLflSVKKKVflFDGTNS 

SEG 

PRD cccccccccccccccccccccccccccccccceeeeeeecccccchhhhhhheeecccch 
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SE<3 AFKELKFLTPVRRSRRLt3EKTSKLPI>riLKDHYPCVSSLE(3LTELGRETDAFVCRPNAALC 

SEG 

PR3> hhhhhhhchhhhhhhhhhhhhhccccccccccccchhhhhhhhhcccccGGGGCCcceee 

SEG RVYYEADTT 

SEG 

PRD eeeeccccc 



(No Prosite data availablG for DKFZphtGs3_31alD • 2 ) 
(No Pfara data available for DKFZphtes3_31al0.5) 
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5 group: signal transduction 

DKFZphtes3_31 jEO encodes a novel 31E amino acid protein that 
contains a Protein phosphatase 2C motif • 



10 



15 



The novel protein shares 15/C identity withthe rat protein 
phosphatase EC and is expressed ubiquitously- PPEC is a 
structurally diversified protein phosphatase family with a wide 
range of functions in cellular signal transduction. The 
transcription of the PPECdelta gene was activated in response to 
stress-) like alcohol or UV irridation- PPEC plays a role in cell 
cycle control. 



The new protein can find application in and the diagnosis/therapy 
of stress related diseases and canceri as well as a for 
20 modulation of cell cycle and signal transduction- 

strong similarity to protein phosphatase EC (Rattus norvegicus) 

25 Sequenced by LMU 

Locus: unknown 

Insert length: lM3b bp 
30 Poly A stretch at pos- 13b7n polyadenylation signal at pos- 13H1 



35 



40 



45 



50 



55 



1 
51 
1D1 
151 
E01 
E51 
301 
351 
MD1 
M51 
501 
551 
bOl 
b51 
701 
751 
fiDl 
651 
■JD1 
S51 
1001 
1051 
1101 
1151 
1E01 
1S51 



CGCTGCTCGC 
CCGCCATGGA 
CCGGCTGCCG 
CCCTCCGGCC 
ATGATCTCCC 
ATATCCCAGA 
CGAGGAAGAG 
AAGCCTCTTC 
GGTGAGAGGG 
CGAGGAGTGT 
CTGTTTTTGA 
AATTTGCATC 
TGTAGAGAAA 
ATGAAGAGTT 
GGGTCCACTG 
CAACCTCGGA 
AAAAACATGC 
GAAGAGCGGA 
TGTTTTGGGC 
AGCGCTGCGG 
CCCAATGACA 
TACCCCAGAA 
AGATCCAGAC 
GCCTGCAACA 
CGTCACTGTG 
AGGAGCACGC 



GGGCTGAGTG 
CCTCTTCGGG 
GGA AAGAAGC 
AGCAGTACTG 
ACCCGCTAGC 
TGGTAA AGAC 
AAGAATGGCA 
GGTGATCTTT 
AGGAGATGCA 
AGGCCCCCAT 
TGGACATGGA 
A AAACTTAAT 
ACCGTGAAGA 
CCTTAAACAA 
CCACGTGTGT 
GATAGTCGGG 
AGCCTTAAGC 
TGAGGATACA 
GTGCTAGAGG 
TGTCACCTCT 
GGTTCATTTT 
GAAGCCGTGA 
CCGGGAAGGG 
GGCTGGCCAA 
ATGGTGGTGC 
ATGGTATTGA 



TCTGTCGCTG 
GACCTGCCGG 
TCAGAAAGGA 
ACTCAGGATC 
AGTGGCGATT 
TGAAGGGAAA 
GTGAAGAGCT 
GGTCTGA AGG 
GGATGCCCAC 
CGTCCCTCAT 
GGAATTCGAG 
CAGAAAATTT 
GATGCCTTTT 
GCTTCCAGCC 
TCTGGCTGTA 
CAATCTTGTG 
CTCAGCAAAG 
GAAGGCTGGA 
TGTCACGCTC 
GTGCCCGACA 
GTTGGCCTGT 
ACTTCATCTT 
AAGTCCGCAG 
CAAGGCGGTG 
GGATAGGGCA 
CTTA AAAGGT 



CTGCCGCCTC 
AGCCCGAGCG 
CCCCTGCTCT 
AGGGGGACCT 
CAGGTTCTCT 
GGAGCAAAGA 
TGTGGAAAAG 
GCTATGTGGC 
GTCATCCTGA 
TACTCGGGTT 
CCTCAAAATT 
CCTAAAGGAG 
GGACACTTTC 
AG AAGCCTGC 
GACAACATTC 
TCGTTATAAT 
AGCATAATCC 
GGAAACGTCA 
CATTGGGGAC 
TCAGACGCTG 
GATGGGCTCT 
GTCCTGTCTC 
CCGACGCCCG 
CAGCGGGGCT 
CTGAGGGGTG 
TCATTTTGTG 



CACCCAGCCT 
CTCGCCGCGC 
TTGATGACCT 
TTGCTTTTTG 
TGCCACATCA 
GAAAAACCTC 
AAAGTTTGTA 
TG AGCGGA AG 
ACGACATCAC 
TCATATTTTG 
TGCTGCACAG 
ATGTAATCAG 
AAGCATACTG 
CTGGAAAGAT 
TTTATATTGC 
GAGGAGAGTC 
AACTCAGTAT 
GGGATGGGCG 
G6GCAGTACA 
CCAGCTGACC 
TCAAG6TCTT 
GAGGATGAAA 
CTACGAAGCA 
CGGCCGACAA 
GCGCGCGGCC 
TGTGTGCACA 
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1301 TTGTGTGTTT TGTGTACTCC TGTGGGACTC CCATGGTTGT AAATAAAGGT 
1351 TTCTCTTTTT TTTTCCTAAA AAAAAAAAAA A A A AA AAAAA AAAAAAAAAA 
IMOl AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAG 

5 

BLAST Results 



No BLAST result 

10 

Medline entries 



15 ^074314: 

Tong Yi <3uirion R«t Shen SH-t Cloning and characterization of a 
novel 

mammalian PP2C 

isozyme. J Biol Chem l^a Dec 25^273 (52) : 352fl2- c 10 

20 



25 



Peptide information for frame 2 



ORF from 5b bp to 1231 bp=i peptide length: 3^2 
Category: strong similarity to known protein 
Classification: Protein management 
30 Prosite motifs: ppec (147-155) 

1 HDLFGDLPEP ERSPRPAAGK EAfiKGPLLFD DLPPASSTDS GSGGPLLFDD 

51 LPPASSGDSG SLATSIS<3I1V KTEGKGAKRK TSEEEKNGSE ELVEKKVCKA 

35 1D1 SSVIFGLKGY VAERKGEREE MdDAHVILND ITEECRPPSS LITRVSYFAV 

151 FDGHGGIRAS KFAASNLHdN LIRKFPKGDV ISVEKTVKRC LLDTFKHTDE 

201 EFLKGASSflK PAUKDGSTAT CVLAVDNILY IANLGDSRAI LCRYNEESGK 

251 HA ALSLSKEH NPTflYEERNR IdKAGGNVRD GRVLGVLEVS RSIGDGdYKR 

301 CGVTSVPDIR RCC3LTPNDRF ILLACDGLFK VFTPEEAVNF ILSCLEDEKI 

40 351 (2TREGKSAAD ARYEAACNRL ANKAVflRGSA DNVTVHVVRI GH 



45 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_31 j2Dn frame 2 
50 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_31 j20-t frame 2 

55 Report for DKFZphtes3_31 j2D - 2 

ELENGTH3 WD 
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enibo ^751.05 

EpIJ 7- c J5 

EHONOLD TREMBL : AFD1S^B7_1 product: "protein phosphatase 

BC"=i Rattus norvegicus protein phosphatase EC mRNA-* complete cds. 
5 D-D 

EFUNCAT3 03-01 cell growth ITS. cerevisiae-i YDLOObwU be-B5 
EFUNCATIB 10-03.13 key phosphatases ES- cerevisiae-i YDLOObwl 

be-25 

EFUNCATJ QT-lb mitochondrial biogenesis ES. cerevisiae-i 

10 YDLOObwID be-B5 

EFUNCAT3 11-D1 stress response ES- cerevisiae-i YDLODbwJ be-B5 

EFUNCAT J 03- D 1 * budding-i cell polarity and filament formation 
ES- cerevisiae-. YDLOObwD be-E5 

EFUNCATID 01-05-OM regulation of carbohydrate utilization ES - 
15 cerevisiae-i YDLODbwJ be-E5 

EFUNCAT3) Tfi classification not yet cleai — cut ES • cerevisiae-r 

YEROfl^cJ le-B3 

EFUNCATID unclassified proteins ES - cerevisiae-. YORDTOcJ 

le-lB 

20 EFUNCAT2 03.BB cell cycle control and mitosis ES. cerevisiae-i 
YJLOOSwJ 3e-10 

EFUNCATID 03-10 sporulation and germination ES - cerevisiae-i 
YJLOOSwJ 3e-10 

EFUNCATID 30-OB organization of plasma membrane ES- cerevisiae-i 
25 YJLOOSwI 3e-10 

EFUNCATJ 01-03-10 metabolism of cyclic and unusual nucleotides 
ES- cerevisiaei YJLOQSwJ 3e-10 

EFUNCATID 10-O l 4-Q3 second messenger formation ES- cerevisiaei 

YJL0D5w3 3e-10 
30 EBLOCKSJ PR010B3F 

EBLOCKSJ PR00b?7D 

EBLOCKSJ BL0103BI 

EBLOCKSJ BLD103BH 

EBLOCKSJ BLD1D3BG 
35 EBLOCKSJ BL0103BC Prot_ein phosphatase BC proteins 

EBLOCKSJ BL01033B Protein phosphatase BC proteins 

ESC0P3 dlabq ^^a-l-l-l Protein serine/threonine 

phosphatase BC EHuma le-107 

EECJ 3-1-3-M3 EPyruvate dehydrogenase C lipoamide)J- 

40 phosphatase Se-OT 

EECJ 3-1-3-lb Phosphoprotein phosphatase 7e-35 

EECJ M-b-1-1 Adenylate cyclase Be-11 

EPIRKIO duplication 5e-ll 

EPIRKIO tandem repeat fie-OT 

45 EPIRKIO serine/threonine-specif ic phosphatase Be-B7 

EPIRKIO magnesium be-Eb 

EPIRKIO cAFIP biosynthesis Se-11 

EPIRKIO liver Be-E7 

EPIRKIO leucine zipper le-Ofl 

50 EPIRKID mitochondrion 3e-0T 

EPIRKIO phosphoric monoester hydrolase 7e-35 

EPIRKIO phosphorus-oxygen lyase Be-11 

ESUPFAM3 leucine-rich alpha-B-glycoprotein repeat homology Be-11 

55 ESUPFAM3 yeast adenylate cyclase catalytic domain homology Be-11 

ESUPFAPU kinase interaction domain homology 3e-ll 
ESUPFAM3 yeast adenylate cyclase 5e-ll 
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EPROSITEU PPEC 1 

EPF AMID Protein phosphatase EC 

CKIO Alpha_Beta 

5 

SE(2 A ARGLSVCRCCRLHPASANDLFGDLPEPERSPRPAAGKE AC2KGPLLFDDLPPASSTDSGS 
PRD ccceeeeeeeeccccccceeeecccccccccccccccccccccccccccccccccccccc 

SEfl GGPLLFDPLPPASSCDSGSLATSIStSMVKTEGKGAKRKTSEEEKNGSEELVEKKVCKASS 
10 PRD ccceeeccccccccccccccccccccccccccccccccccccccccccccccccccccce 

SEtf VIFGLKGYVAERKGEREEri(3DAHVILNI>ITEECRPPSSLITRVSYFAVFDGHGGIRASKF 
PRD eeeceeeecchhhhhhhhhhhhheeeeccccccccccccccceeeeeeeccccchhhhhh 

15 SE<3 AA(2NLHt3NLIRKFPKGDVISVEKTVKRCLLDTFKHTDEEFLK(3ASS(3KPAlJKDGSTATCV 
PRD hhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccceeee 



20 



30 



SE<3 LAVDNILYIANLGDSRAILCRYNEES(3KHAALSLSKEHNPT(3YEERriRI(2KAGGNVRDGR. 

PRD eeccceeeeeccccceeeeeeccccccccceeeeecccccccchhhhhhhhcccceeeee 

SE(2 VLGVLEVSRSIGDG(3YKRCGVTSVPDIRRC(3LTPNDRFILLACDGLFKVFTPEEAVNFIL 

PRD ccccceeeeccccccccccccccccccccccccccceeeeeecccccccccchhhhhhhh 



SE(3 SCLEDEKIflTREGKSAADARYEAACNRLANKAVflRGSADNVTVNVVRIGH 
25 PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhccccccceeeeeeccc 



Prosite for DKFZphtes3_31 j£D • E 
PS01D3E lbS->17M PPEC PD0CDD7^S 



35 - Pfam- for DKF-Zphtes3_31jE0.E 



HW1_NA11E Protein phosphatase EC 
40 HUM 

*GlCcN<2GPRIilRMsriEDaHiaylNF - pcnlDUUhiHFFGVFDGHg 

+ + + +G R + + I1+DAH+ + ++ P ++L + + 

+++F+VFDGHG 

duery lEfi YVAERKG — EREEN<2DAHVILNDITEECRPPSSLITR- 

45 VSYFAVFDGHG 173 

HMfl GDGCSGUCgeHUHdll* 

G+++S++ +++H+ + 
Query 17M GIRASKFAA<2NLH(2NL Ifi^ 

50 



-379- 



WO 01/98454 
DKFZphtes3_5kE2 



PCT/IB01/02050 



5. group: signal transduction 

DKFZphtes3_5kEE encodes a novel 455 amino acid protein with 
similarity to human paraneoplastic neuronal antigen MAI - 

10 Antibodies against MAI where found in patients with 
paraneoplastic neurological disorders- The protein is 
predominantly expressed in testis and braini but ESTs are also 
found in liveri lung uterus and kidney- 

15 The new protein can find application in study ing/therapy of 
paraneoplastic neurological disorders • 



20 



25 



strong similarity to paraneoplastic neuronal antigen flAl 

Sequenced by fliagen 

Locus: unknown 

Insert length: 3534 bp 

Poly A stretch at pos. 3514n polyadenylat ion signal at pos. 34^4 



1 GAACGTCCGC GCTGGGAGCC 

30 51 GCCGCCGCCG CGCATAGCCC 

1D1 AGGGACCTTG CCCTGGGAGA 

151 ACCCTAGGAG TTGATCCAGA 

EDI AAATTAGTAT CCGCAGAGAT 

E51 ACTGGTGTCG GGGGGAACAC 

35 301 GGGATCCCCG AGGACTGTGG 

351 GGCTTGCAGG CACCTGGGCA 

401 GGGAGGAGAA CGCCCAGGCG 

451 TATGCTTTGC TCCCAAGGGA 

5D1 GATTGT AAAA CCCCGTAAGT 

40 551 GCTTCTTAGA GGAGGAGAGG 

bOl GGGTCGGACA CCAATTGTTC 

b51 CTGGACCTGG GCCCAGACTC 

7D1 A AATGTTGTA CCGAGAACTA 

751 CCAGGTGCAC TGGCCTTTGA 

45 601 ACAGATGTGG CAGGTGCCCG 

651 GCTTACGGGG CCCTGCTCTC 

^□1 GCTTCCATAA CTGTGGAGGA 

=151 ACCTGTGGAG AGCCATAAAA 

10D1 AGGAGGCAGG AGAGAAAGTA 

50 1D51 CTCCAAAGAG CTGTAGAAAA 

1101 GACTCGCCTG AAACGAGTCT 

1151 GAGATA AGCT TAAGCTGATG 

1ED1 GCCCTGGTGA AGCTCCTGCG 

1E51 TCCAGATAGG GAGAGTCTGG 

55 1301 CCAGGATCAC TGGGGTTGGG 

1351 TTTGATGCGA GGCCTTCCCA 

1401 ACACCGAAGG GGTGGTGTGG 

1451 GGAAACGCCA CACATTCTGC 



AGGGGTGCCC 
CCGGAGAGCC 
AGGCTGTGGA 
TATGTGCCTC 
TCGAGGACAT 
CTGAACACCC 
CGAGGATG AG 
GATACAGGGT 
ATTCTACTGG 
AATACCAGGA 
CAGATGGGGA 
CGGACCGTGT 
GGCTCCAAGA 
TGGGGGCAGC 
AGAGTGTTTT 
TGCCTGGCTT 
AGGGGGAAAA 
CAGGTGGTCA 
GTGCCTGGCT 
TTGCCCAGGT 
TCTAGCTTTG 
CAATGTGGTA 
TAAGTGGGGC 
AAACAGCGAA 
TGAGGAGGAG 
AGGGGCTGGA 
GCAGTACCTC 
GGGCTACCGG 
CAAGGGCTGG 
TAT AGCTGTG 



GACCCCCGTC 
CTCTGGGGAC 
GACCTGGGCC 
ACGCCCTGAT 
GCCGTTGACC 
GGAGGTGCAT 
TTTGAGGAGA 
GATTGGCAGG 
AGCTGGCACA 
AAGGGGGGGC 
ATTTCTCAAC 
CAGATATGAA 
GTGACTATAT 
AGTGCAGCCT 
CTGGGAACAC 
GAGCACACCA 
GAGGCGGAGG 
GTGGGCTCCG 
GCCTTGCAGC 
GAAGTTGTGT 
TGTTACGTTT 
TCACGTAGAA 
CACCCTTCCT 
GGAAGCCTCC 
GAATGGGAGG 
AGTAGCCCCA 
TCCCTGCCTC 
CGCCGGAGGG 
CTCTCGAGGC 
GGGAAGACGG 



CGCCGCCGCC 
CCCGACCAGA 
TTCTGCGATC 
CACTCCCCCC 
TTGTTACAGG 
GCTCATCCTG 
CACTCCAGGA 
ATGTTTAGGA 
AGATATCGAC 
CCTGGGAAGT 
AGACTGAACC 
CCGAGTCCTC 
CACCAGAGTT 
CTGCTAGAAC 
CATATCCATC 
CTGAGATGCT 
CTGATGGAAT 
GGCCAGCAAT 
AGGTGTTCGG 
AAAGCCTATC 
GGAACCCCTG 
ACGTGAATCA 
GACAAACTCC 
TGGTTTCCTG 
CCACTTTAGG 
AGGCCACCTG 
TGGCAACAGT 
GCAGAGGCCA 
TCAAGAAAAC 
CCACATC AGG 
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1501 GTACAGTGCA 
1551 GATATTGGAA 
IbOl AGTGGGGCAG 
IbSl TTGGCCAGCT 
5 1701 CAG1GGGGG1 
1751 GGAGCCACTG 
IflDl CAAAGAAGCA 
1B51 TCCAACCACC 
nDl ACTGGCCCCC 

10 1^51 CCTGCCAGCA 
2001 AAGCCCTGCA 
E051 CACTGCACTA 
E101 AGTGAGGACA 
E151 TTGTGCAGGC 

15 EE01 ACTGGGT ACC 
2251 GCTGTGA AGG 
2301 TTGGTAGGGT 
2351 CCTGGAGAGA 
EMD1 TCAACAG ATG 

20 EHS1 GGGGAATCCA 
E5D1 GGAGGCACCT 
ES51 GATGACAGGG 
EbOl CCAAGAGGGG 
EL51 CCAGGAAGCT 

25 E701 CAGCAGACGC 
E751 CTATGCACCC 
EflDl GCTTCCTGTG 
EflSl CACCTCTGCT 
2^01 GGAGCTGTGT 

30 ET51 ACAGTCCCAG 
3001 ACCCTGTGAC 
3051 AGGACGG AGC 
3101 TGGTGTTCCA 
3151 GGGTGTTCCC 

35 3E01 CATAGGGCAG 
3251 AAGCCAGTGG 
3301 GGGGCCCTCA 
3351 GAGCAGATGA 
3M01 CTGTGTGGGA 

40 3M51 TGTGATTCCC 
3501 CTTCATTCAG 



TCAACCCCTC 
GGAGGGGAAA 
AGGGACAGGG 
GGAAGGGACT 
CAGGTCCAGG 
CATCCAGCAC 
GAAGCTGAAG 
CCTAAATACC 
AGGCCACATG 
GGTGCCAGGG 
GGGGAGGCCA 
GGGGAGCCCC 
GGAGACCCTA 
AAGGAGACCA 
CGGCAGGCCA 
GAAGTCTTTG 
TTTGTTTCTT 
TCAAGCAAGG 
CCAATGCCAA 
GCCTGGAGGC 
GAAGACGTCT 
GGAAAGAGGC 
TCCCCCAACT 
CATCCTCACC 
CCACCCACCG 
ACCCAGGTGG 
AGCCCAGGCT 
TTCCCTGTGT 
GTGTTCTTCT 
GAGATGACAG 
CACGATGGTG 
GGGGCTGCCC 
CCCCTGTTGT 
TGCTGTGTTT 
GGCCCTGCCC 
ACACTGCCAG 
GGAAGGAGGG 
GGCGTCTTGC 
CCTGTGTCCT 
TTCTCTTCAA 
TGTTAAAAAA 



CAACCTGCTC 
GAGAAGCCCA 
CAGCCAGACC 
TCAGCAACCA 
TCCCCGAAGA 
ATGGGGTGCC 
AAGGTACGGC 
CACCCTGTGG 
GGACCTGGAG 
CTGGTGAGGA 
CAGGGTCCAT 
AGGAAGGCAG 
AGGCCCCGGG 
AAGATTGATG 
GTGCCCTCAC 
TTGCAAAGGA 
CTGCTTGTTT 
AGAACCTGGG 
TTCCAAGGCC 
ATCCCCTAAG 
GTCCCAAACT 
CCTCTCAAGG 
CACCGTTCCC 
TGTGTAGGCC 
CTAGCCGTTG 
CCGCCTCTGG 
CTGCTCACTG 
AGATCTAGGC 
CTGAGCAGCT 
GAAGGAAGCA 
ACCGTGGCTG 
TGCCTGAGGC 
TACTCATGAC 
TCCAGTGTCC 
CAGCA GAT GG 
AGTCTACCTT 
AGTTGGCAGC 
CAGGAACCTC 
GTGGTGGCCG 
TGGTTTCAGT 
AAA AAAAAAA 



TTGGCCAAGG 
GACAAACAGC 
AAGGCCAAGC 
AGACCACCTG 
GGTGCTGGAG 
TGGGCCTCAG 
TGGGGGTTCT 
ACTTTGAGCT 
GAGCCTACCT 
AGAGCTGGGG 
CCCGTCTTCA 
CACCCTGGAG 
AGCCCAGTGC 
AGAAGACCCC 
AGTTGACTTG 
GGAGGAGGAA 
CTGTACAGGG 
GCTGCCATG6 
AGCCACAACC 
CAGCCAGCCA 
CCCCCAGCCC 
GTGCCAGATG 
GGGACAGGCT 
CCTGTAGTGA 
TTCCTGTGCA 
GCCCAAGGCA 
CTGTCCCGCG 
CAGTGGCTGC 
CCTCCCCGGA 
CCAGGGCAAG 
TGGGAGGAAG 
TCCCGAGGAG 
TCAGTTTCCT 
TGTGACTGTC 
GCTTGGGAGG 
CCTGGCAAGA 
GGGGGCTGCA 
AGGAGGAGGG 
TTTGCAGTTT 
ACGTGTTTCT 
AAAA 



PCT/IB01/02050 

AGACAAAAGA 
AGATGAGTTG 
CTTCTCACCC 
GCAACAGGCT 
AGGAAAGCAG 
ATGGGGACCC 
GTCCTGCTCA 
GAACATGCCC 
GGGGCCTGCC 
GGCAGAGGTA 
GGATCATCTA 
GCCCTGTGCC 
CAGCCAGAGG 
CAGCAGGGGT 
GACCAGGGTG 
AAGGGAGGAC 
CCACCAGACT 
CCAAAGCAAC 
CTGCCACCTT 
TGGCCTGGGT 
TGAGCTGGGA 
CCTGGGTCTC 
GCCCCCTGTT 
CCCACGCGTC 
AAGTAGTGTG 
CATGCTGTGA 
TCATGAGCAC 
TTGTTCTTGT 
GTCCCCCAGC 
GCGGACGCTC 
A ACTGGACCC 
CTTTGTGCTT 
TGACCTGGTA 
CTGTGCGGGC 
GGGCTCCCTA 
GGCAGACCCC 
GCAGGAGTAG 
GGCCCGGGAC 
CTCTCTGTGT 
CTTCAATAAA 



45 



BLAST Results 



No BLAST result 



50 



Medline entries 



^15fll?T: 

Main a novel neuron- and testis-specif ic proteinn is recognized 
55 by 

the serum of patients with paraneoplastic neurological disorders 
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Peptide information for frame 1 

5 

ORF from bp to 15^3 bp^ peptide length: M55 

Category: strong similarity to known protein 
Classification: unclassified 

10 1 MPLTLLtfDUC RGEHLNTRRC MLILGIPEDC GEDEFEETLd EACRHLGRYR 

51 VIGRNFRREE NAflAILLELA (2DIDYALLPR EIPGKGGPWE VIVKPRNSDG 

101 EFLNRLNRFL" EEETRRTVSDM NRVLGSDTNC SAPRVTISPE FUTUA(3TLGA 

151 AVflPLLEflML YRELRVFSGN TISIPGALAF DAULEHTTEM LdMldflVPEGE 

EDI KRRRLMECLR GPALflVVSGL RASNASITVE ECLAALt3(3VF GPVESHKIACJ 

15 ESI VKLCKAYtJEA GEKVSSFVLR LEPLLdRAVE NNVVSRRNVN (3TRLKRVLSG 

301 ATLPDKLRDK LKLMKflRRKP PGFLALVKLL REEEEUEATL GPDRESLEGL 

351 EVAPRPPARI TGVGAVPLPA SGNSFDARPS (2GYRRRRGRG (2HRRGGVARA 

MD1 GSRGSRKRKR HTFCYSCGED GHIRVtfCINP SNLLLAKETK EILEGGEREA 
M51 (2TNSR 

20 

BLASTP hits 

25 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_5kEE •» frame 1 

TREMBLNEti):ABOE0b^O_l gene: "KI A A0flfl3" ^ product: "KIAADflA3 
30 protein n: i 

Homo sapiens mRNA for KIAAdflfl3 protein-, complete cds-i N = l-i 
Score = 

7ESn P = E-He-71 

35 TREI1BL: AFD373bH^l gene "MAI" * product: "paraneoplastic neuronal 
antigen NA1"t Homo sapiens paraneoplastic neuronal antigen MAI 
(MAI) 

mRNAn complete cds-n N = 1 -i Score = bb5n P = E-be-bS 



40 
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>TREMBLNEU: ABDEDb^D.l gene: °KIAAaflfi3"T product: "KIAA0flfl3 
protein"^ Homo 

sapiens mRNA for KIAADflfl3 proteinn complete cds- 
Length = 3bM 

HSPs: 

Score = 7ES (106-3 bits)-. Expect = E-Me-71-. P = E-He-71 
Identities = 15b/3Hfl (MM*)-. Positives = E15/3MA (bl*) 



fluery: 1 

MPLTLL(3DL)CRGEHLNTRRCMLILGIPEDCGE3>EFEETL(3EACRHLGRYRVIGRMFRREE bO 

M L LL + DWCR ++ ++ +++ GIP D E E +E L<2E + 
LGRYR++G++FR++E 
55 Sbjct: 1 

MALALLEDUCRIMSVI>E(3KSLMVTGIPAI>FEEAEI(3EVL(3ETLKSLGRYRLLGKIFRKr3E bD 
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WO 01/98454 PCT/IB01/02050 
Query: fc>l 

NA(3AILLELA(3DIDYALLPREIP6KGGPU)EVIVKPRNSDGXXXXXXXXXXXXXXXTVSDn 12D 
NA A+LLEL +D D + + P E + GKGG U + VI K N D 

TVS fl 
5 Sbjct: bl 

NANAVLLELLEDTDVSAIPSEVt3GKGGVUKVIFKTPN£3DTEFLERLNLFLEKEG(3TVSGI1 120 

Query: 121 NRVLGSDTNCSAPRVTISPEFUTW— 
AfiTLGAAVdPLLEGflLYRELRVFSGNTISIPGAL 17fi 
10 R LG + A ISPE Q + A (3PLL il YR+LRVFSG+ 

+ P 

Sbjct: 121 FRALGCEGVSPATVPCISPELLAHLLGGAMAHAPtfPLLP- 
NRYRKLRVFSGSAVPAPEEE 17T 

15 Query: 17^ 

AFPAWLEHTTEI1L(3niJ(3VPEGEKRRRLnECLRGPAL(3VVSGLRASNASITVEECLAAL(3fl 23fl 
+ F+ ULE TE +++ LI V E EK + R L E LRGPAL ++ ++A N 

SI+VEECL A + <2 
Sbjct: IflO 

20 SFEVULE(3ATEIVKEUPVTEAEKKRWLAESLRGPALDLriHIVflAI>NPSISVEECLEAFK(3 23^ 
Query' 23=1 

VFGPVESHKIA(3VKLCKAY<2EAGEKVSSFVLRLEPLLflXXXXXXXXXXXXXXXXXLKRVL 2Tfl 
VFG +ES + AOV+ K Yt3E GEKVS++VLRLE LL + 

25 L++V+ 

Sbjct: 2M0 

VFGSLESRRTA(2VRYLKTY(2EEGEKVSAYVLRLETLLRRAVEKRAIPRRIAD(2VRLEt2Vf1 2^ 

Query: 2^ SGATLPDKLRDKLKLMKI3RRKPPGFLAL VKLLREEEEUEATLGPDRESLE 
30 3Hfi 

+GATL L +L+ +K + PP FL L+K++REEEE EA+ + ES + E 
Sbjct: 300 AGATLN(2NLlilCRLRELKD<3GPPPSFLEI_riKVIREEEEEEASF — ENESIE 
347 

35 

Pedant information for DKFZphtes3_5k22 -i frame 1 



40 



Report for DKFZphtes3_5k22 • 1 



CLENGTHJ M55 

CMIO S151M-3M 

EpIH T-27 

45 EHOMOLJ TREP1BLNELJ : ABD2Db t lD_l gene: "KIAA0flfl3"n product: 

"KIAA0A63 protein"^ Homo sapiens mRNA for KIAADafi3 protein-i 
complete cds- 3e-75 

0EBLOCKS3 BLDDfl7bB Indoleamine 2 -i3-dioxygenase proteins 
CPFAI13 Zinc finger-. CCHC class 

50 EKIO AlphaJBeta 

QIKliU LOU.COflPLEXITY 13. m V. 



SE<3 MPLTLLt3DUCRGEHLNTRRCI1LILGIPEDCGEDEFEETL(3EACRHLGRYRVIGRMFRREE 

55 SEG 

PRD ccchhhhhccccccccccceeeeeecccccchhhhhhhhhhhhhhccceeehhhhhhhhh 

SE(2 NA(3AILLELA(3DIDYALLPREIPGKGGPWEVIVKPRNSDGEFLNRLNRFLEEERRTVSDf1 
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WO 01/98454 PCT/IBOi/02050 . 

SEC • xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhcccccccccccccccceeeeeeecccccchhhhhhhhhhhhhhchhhhh 

SEfl NRVLGSDTNCSAPRVTISPEFUTUA(JTLGAAV(3PLLE<3riLYRELRVFSCNTISIPGALAF 

5 SEG 

PRD hhhhcccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhheeeccccccccchhhh 

SEfl DAULEHTTEHLl2nU<3VPEGEKRRRLMECLRGPAL<3VVSGLRASNASITVEECLAAL(J<2VF 

SEG 

10 PRD hhhhhhhhhhhhhhhccchhhhhhhhhhhhccccccccccccccceeehhhhhhhhhhhh 

SE<2 GPVESHKIA(JVKLCKAY(JEAGEKVSSFVLRLEPLL£2RAVENN VVSRRNVNdJTRLKRVLSG 

SEG xxxxxxxxxxxxxxxxx 

PRD hccchhhhhhhhhhhhhhhcccccceeeeehhhhhhhhhhhhcchhhhhhhhhhhhhhhc 



15 



30 



SEfl ATLPDKLRDKLKLMKQRRKPPGFLALVKLLREEEElilEATLGPDRESLEGLEVAPRPPARI 

SEG «...' 

PRD ccccchhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhcccchhhhheeeecccccccc 



20 SEfl TGVGAVPLPASGNSFD ARPSflGYRRRRGRGflHRRGGVARAGSRGSRKRKRHTFCYSCGED 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx - 

PRD eeeoeeccccccccccccccccccccccccccceeeeeeccccccccccceeeeeccccc 

SEfl GHIRVflCINPSNLLLAKETKEILEGGEREAflTNSR 

25 SEG 

PRD ceeeeeeccccchhhhhhhhhhhcccccccccccc 



(No Prosite data available for DKFZphtes3_SkSS - 1 ) 

Pfam for DKFZphtes3_5kHS • 1 



35 Hnt"l_NAME - Zinc f inger, CCHC class - 

HMD *flkCl)NCGKPGHMI1RDCPE* 

C++CG+ GH+ +C + 
fiuery M12 TFCYSCGEDGHIRVflCIN 1*5=1 

40 



-384- 



WO 01/98454 
DKFZphtes3_7nlE 



PCT/1B01/02050 



5 group: transmembrane protein 

DKFZphtes3_?nlE encodes a novel 7D3 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 1 transmembrane domain 

No informative BLAST results^ No predictive prosite-. pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
15 profile of testis-specif ic genes and as a new marker for 
testicular cells. 



putative protein 

20 

contains transmembrane domain 
perhaps complete cds. 

Sequenced by BMFZ 

25 

Locus: unknown 

Insert length: 53M? bp 

Poly A stretch at pos* EE?l-» polyadenylation signal at pos- EE53 

30 

TGGAGTTTGT 
CTCTTCGGCT 
TAGCCTTGAG 
CCCAGGACCC 
GGCATCAGCC 
GACTGCCTGG 
GCCCCTACAT 
GAAGAGGAGG 
CAGCACAGAG 
GCAGGGAGTC 
GCCCGGGAGG 
GAACGAAACA 
CCCTGCACGT 
TTGGGCATTG 
CTCAGGGACC 
GCATGGCCTG 
ACTGTGGGAG 
GCTAGGTGTG 
CAGGCTTCAC 
AGGCACAGCT 
CTTGTATTGT 
ATGATGTGAG 
CGGGCAGGAG 
CAGTGAAACC 
GTGGCTATTG 
AGACACCCCC 
CCCCTACCAG 

-385- 



1 C6GCTGCAGT CTGGGCCGGG GCCCTGT6CC GCTGA AGACA 
SI GTCTGGATAC CGGG ATGAGT TCCTTGATTT CACTGCCCTT 
101 GGTTCCGAAA 6TTTGTGGCA GAGCGTGGAG CTGTAGGGAC 
35 151 GGCCGCTGCC GGCAGCTGGA GGCCCAGATC AGAAGGCTAC 

501 TGCCCTTTGG GTGCTCCATG TCCTGCCCAA CCATAGTGTG 
ESI TGGGGCAAGG GGCAGAACCA GGTCCTGGAC CAGGCCTGGG 
301 CTCCTGGGAG ACAACCCTCC ACTCCACCTG CGAGACCTGA 
3S1 CAGCTTTGTC AGCCTAGAGG ATGGGGAGGA AGGGGAGGAG 
40 M01 AAGATGAAGA AGAAGAGAAG AGAGAGGACG GGGGTGCAGG 

MSI AAGGTGGAAC CAGAGGAGGA CCGGGAGCTA GCCCCTACCA 
501 CCCCCAGGAA ACAAACCCTC CAGGAGAGTC AGAGGAG6CT 
551 CAGGAGGTGG CAAGGATGGC TGCCGAGAGG ACAGGGTGGA 
L01 AGACCCCAGA AGAGGAAGGG ACAGAGGAGT GAGGCTGCCC 
45 b51 TTCCTGTCTC TTACTTGTGA CGGATGAGCA TGGCACCATC 
7D1 ATCTGCTAGT GGATGGAGCC CAGGGAACCG CAAGCTGGGG 
7S1 AAGGACCTGG CTCCTTGGGC CTATGCTCTC CTCTGTCACA 
flOl TCCCATGGGC TCTGGGGATC CCCGAAA6CC CCGACAGCTT 
flSl ATGCCCGGCT GCATCGAGAG CTGGAGAGCT TGGTCCCAAG 
50 101 AAGTTAGCCA AAACCCCAAT GCG6ACATGG GGTCCCCGGC 
151 CTTTGCTTCC CTTCGTGCTC GAACCTGCCA TGTGTGTCAC 
1001 TTGAAGCGAA GCTGACACCT TGCCCCCAGT GTAGTGCTGT 
1051 GGAGAGGCTT GTCTCCGGGC TGACTGGCAG CGGTGCCCAG 
1101 TCACCGATTT TGGTGCCCAA GGCTTGCAGC CTTCATGGAG 
55 1151 AACT6GCAAC CCTACCTTTT ACCTACACC6 CAGAGGTGAC 
1E01 TTCAACAAAG AGGCCTTCCT GGCCTCTCGG GGCCTCACTC 
1E51 GACCCAGCTC AGCATGCTGA TTCCAGGCCC GGGCTTCTCC 
1301 GAGGCAACAC GCCATCCCTC AGCCTTCTTC GCGGTGGAGA 



WO 01/98454 



PCT/IB01/02050 



1351 CTTCTCCAGG GAGACGGGAC TGCCCTGATG CCTCCTGTGC CCCCACATCC 

1401 ACCCCGGGGT GTTTTTGTCC CTGAGCTCAA CATCCAAAAC AAACAGTCAC 

1151 TGAA6ATCCA CGTGGT6GAG GCCGGGAAGG AGTTTGACCT TGTCATGGTG 

15D1 TTTTGGGAGC TTTTGGTCCT GCTCCCCCAT GTGGCCCTGG AGCTGCAGTT 

5 1551 TGTAGGTGAT GGCCTGCCCC CCGAAAGCGA CGAGCAGCAT TTTACCCTGC 

IbOl AGAGGGACAG CCTGGAGGTG TCTGTCCGGC CTGGTTCCGG CATATCAGCA 

IbSl C6GCCCAGCT CTGGCACTAA GGAGAAAGGG GGCCGCAGGG ACCTGCAGAT 

17D1 CAAGGTGTCA GCAAGGCCCT ACCACCTGTT CCAGGGGCCC AAGCCTGACC 

1751 TGGTTATTGG ATTTAACTCC GGGTTTGCTC TCAAGGATAC GTGGCTGAGG 

10 IflDl TCTCTGCCCC GGTTACAGTC CCTCCGAGTG CCAGCCTTCT TCACCGAGAG 

1B51 CAGCGAGTAC AGCTGTGTGA TGGACGGCCA GACCATGGCG GTGGCCACTG 

nDl GAGGGGGCAC CAGCCCTCCC CAGCCCAACC CCTTCCGCTC CCCCTTTCGC 

1151 CTCAGAGCGG CCGACAACTG CATGTCCTGG TACTGCAATG CCTTCATCTT 

5001 CCACCTGGTT TACAAGCCTG CTCAAGGGAG CGGGGCCCGC CCGGCGCCCG 

15 B051 GGCCCCCACC CCCATCCCCA ACTCCCTCTG CTCCTCCTGC CCCCACCCGA 

5101 AGGCGCCGAG GAGAAAAGAA ACCT6GGCGG GGGGCCCGCC GGCGGAAATG 

5151 AATGCTGATA CCCTAGTAGT CCCCA6CTCC CAAACACTGA AAGGAAAACG 

5SD1 TGAAAACACT CAAGGCCTAG GGGGAGGACA GGTTGGTAAA ACATGAAAAG 

5551 GTAAATAAAA TTACTTGTTT GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

20 E3D1 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



25 

No BLAST result 



Medline entries 

30 

No Medline entry 



35 •• • ... 

Peptide information for frame 1 



ORF from ID bp to 5146 bp^ peptide length: 7D3 
40 Category: putative protein 

Classification: Transmembrane proteins unclassified 

1 MEFVSGYRDE FLDFTALLFG WFRKFVAERG AVGTSLEGRC R<JLEA(2IRRL 

51 PflDPALUVLH VLPNHSVGIS LGcSGAEPGPG PGLGTAULLG DNPPLHLRDL 

45 101 SPYISFVSLE DGEEGEEEEE EDEEEEKRED GGAGSTEKVE PEEDRELAPT 

151 SRESPUETNP PGESEEAARE AGGGKDGCRE DRVENETRPfl KRKGfiRSEAA 

SD1 PLHVSCLLLV TDEHGTILGI DLLVDGAcJGT ASUGSGTKDL APWAYALLCH 

551 SMACPMGSGD PRKPRfiLTVG DARLHRELES LVPRLGVKLA KTPMRTUGPR 

3D1 PGFTFASLRA RTCHVCHRHS FEAKLTPCP(3 CSAVLYCGEA CLRAPU<3RCP 

50 351 DDVSHRFUCP RLAAFMERAG ELATLPFTYT AEVTSETFNK EAFLASRGLT 

MOl RGYUTflLSML IPGPGFSRHP RGNTPSLSLL RGGPPY(2LL(3 GDGTALMPPV 

451 PPHPPRGVFV PELNIfiNKflS LKIHVVEAGK EFDLVMVFWE LLVLLPHVAL 

501 ELdFVGDGLP PESDEflHFTL (2RDSLEVSVR PGSGISARPS SGTKEKGGRR 

551 DL(2IKVSARP YHLFflGPKPD LVIGFNSGFA LKDTULRSLP RLOSLRVPAF 

55 bOl FTESSEYSCV flDGflTMAVAT GGGTSPPQPN PFRSPFRLRA ADNCMSUYCN 

b51 AFIFHLVYKP AdGSGARPAP GPPPPSPTPS APPAPTRRRR GEKKPGRGAR 
701 RRK 



-386- 



WO 01/98454 



PCT/1B01/02050 



BLASTP hits 

5 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7nl2-. frame 1 
No Alert BLASTP hits found 

10 

Pedant information for DKFZphtes3_?nl2n frame 1 
Report for DKFZphtes3_7nl2 . 1 

15 

CLENGTH3 703 
IE II bO 77315.72 
ILpIJ b-45 
20 EKIO TRANSMEMBRANE 1 

K.KU1 L0lil_C0MPLEXITY 15-22 X 

SE<2 MEFVSGYRDEFLDFTALLFGUFRKFVAERGAVGTSLEGRCRGLEAQIRRLPflDPALUVLH 

25 SEG - 

PRD ccceeeccchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhccccccccccc 
MEM - - 

SE<3 VLPNHSVGISLGflGAEPGPGPGLGTAWLLGDNPPLHLRDLSPYISFVSLEDGEEGEEEEE 

30 SEG xxxxxxxxxxx 

PRD cccccccccccccccccccccceeeeeecccccccccccccceeeeeeccccchhhhhhh 

MEM 

SEC EDEEEEKREDGGAGSTEKVEPEEDRELAPTSRESPflETNPPGESEEAAREAGGGKDGCRE 

35 SEG- xxxxxxxxxxxx — xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccccccccccccchhhhhhhhccccccccce 

MEM 

SE<3 DRVENETRPflKRKGORSEAAPLHVSCLLLVTDEHGTILGIDLLVDGAfiGTASUGSGTKDL 

40 SEG 

PRD eeccccccccccccccccccchhhhhheeeecccccccchhhhhhccccccccccccccc 

MEM 

SE<2 APUAYALLCHSMACPMGSGDPRKPRdLTVGDARLHRELESLVPRLGVKLAKTPMRTWGPR 

45 SEG • 

PRD hhhhhhhhhhhhccccccccccccceeeecchhhhhhhhhhhcccccccccccccccccc 

MEM 

SEfi PGFTFASLRARTCHVCHRHSFEAKLTPCPflCSAVLYCGEACLRADWfiRCPDDVSHRFWCP 

50 SEG • 

PRD ccccchhhhhhhhcccccccccccccccccceeeeccchhhhhhhhccccccccccccch 

MEM 

SEfl RLAAFMERAGELATLPFTYTAEVTSETFNKEAFLASRGLTRGYUTCLSMLIPGPGFSRHP 

55 SEG 

PRD hhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhcccccchhhhhccccccccccc 

MEM - 
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10 



15 



20 



25 



30 



WO 01/98454 PCT/IB01/02050 

SE(3 RGNTPSLSLLRGGDPY(3LL(2GDGTALNPPVPPHPPRGVFVPELNI(2NKt2SLKIHVVEAGK 

SEG -xxxxxxxxxxxxxx 

PRD cccccceeeeeccccceeeccccccccccccccccceeeeccccchhhhhheeeeeeccc 

HEM 

SE(2 EFDLVHVFUELLVLLPHVALEL(2FVGDGLPPESDEf2HFTLi3RDSLEVSVRPGSGIS ARPS 

SEG xxxxxxxxxxxxx 

PRD cccchhhhhhhhhchhhhhhhhhhhcccccccchhhhhhhccccceeeeccccccccccc 

men • - . iiririiiririririfirinrniMririri 

SEC SGTKEKGGRRDLfllKVSARPYHLFtJGPKPDLVIGFNSGFALKDTWLRSLPRLCSLRVPAF 
SEG • 

PRD ccccccccccceeeeeeccccccccccccceeeecccccccccccccccccccccccccc 
MEM . 

SE<2 FTESSEYSCVnDGdTMAVATGGGTSPPflPNPFRSPFRLRAADNCHSliJYCNAFIFHLVYKP 
SEG x 

PRD cccccceeeecccceeeeeecccccccccccccccchhhhhcchhhhhhhhhhhhhhccc 
MEM 

SE(2 AG2GSGARPAPGPPPPSPTPSAPPAPTRRRRGEKKPGRGARRRK 
SEG XX XX xxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxx 
PRD ccccccccccccccccccccccccchhhhhccccccccccccc 

riEn 

(No Prosite data available for DKFZphtes3_7nl2-l) 
(No Pfam data available for DKFZphtes3_7nlE-l) 
DKFZphtesa^elb 



35 group : transmembrane protein 

DKFZphtes3_Telb encodes a novel amino acid protein without 

similarity to known proteins. 

40 The novel protein contains 1 transmembrane region- The only EST 
described so far is from testis- 

No informative BLAST results; No predictive prosite-i pfam or SCOP 
motif e • 



45 The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
testicular cells- 

50 putative protein 
1 EST hit 

perhaps complete cds - 
55 Sequenced by DKFZ 
Locus : unknown 



-388- 



WO 01/98454 PCT/1B01/02050 
Insert length: 2011 bp 

Poly A stretch at pos- llfib-, no polyadenylation signal found 



5 1 CATGGCAACA TGAGCAGTGC TGAGATAATT GGTTCTACAA ATCTTATAAT 

SI TCTGCTAGAG GATGAAGTCT TTGCCGATTT TTTCAACACA TTTCTTTCCC 
101 TCCCGGTTTT TGGTCAGACA CCATTTTATA CTGTTGA AAA TTCACAGTGG 
151 AGCTTGTGGC CAGAAATACC TTGTAACTTG ATTGCCAAAT ACAAAGGGTT 
EDI ATTGACCTGG TTGGAAAAAT GCCGATTACC TTTCTTCTGT AAAACAAACT 
10 251 TGTGTTTCCA TTACATTCTC TGTCAGGAGT TCATCAGTTT CATTAAGTCC 

301 CCAGAAGGAG CCAAGATGAT GAGATGGAAA AAGGCAGACC AGTGGCTACT 
351 CCAGAAATGC ATTGGCGGGG TCAGAGGGAT GTGGCGCTTC TATTCCTACC 
4D1 TCACAGGCAG TGCAGGTGAA GAATTGGTGG ATTTCTGGAT CCTTGCTGAG 
451 AACATCCTGA GCATAGATGA GATGGACCTG GAAGTGAGAG ACTACTACCT 
15 501 GTCCCTCCTC CTCATGCTGA GGGCCACTCA TCTGCAGGAG GGCTCCAGGG 
551 TGGTAACCCT CTGTAACATG AACATCAAGT CCCTCCTGAA CCTCTCCATC 
bOl TGGCATCCCA ACCAATCAAC CACTAGGAGG GAGATCCTGA GCCACATGCA 
b51 GAAAGTGGCT CTGTTCAAAC TCCAGAGCTA TTGGCTTCCC AACTTTTACA 
701 CCCACACCAA GATGACCATG GCCAAGGAGG AAGCATGCCA TGGTCTGATG 
20 751 CAAGAGTACG AGACTCGCTT ATACAGCGTT TGCTACACCC ACATAGGAGG 
601 GCTCCCTCTG AACATGAGCA TCAAGAAGTG CCACCACTTT CAGAAACGGT 
651 ACTCAAGCAG GAAAGCCAAG AGGAAGATGT GGCAATTGGT AGATCCTGAC 
101 TCTTGGTCTC TGGAAATGGA TCTCAAGCCA GATGCTATTG GTAT6CCCCT 
151 ACAGGA6ACA TGTCCTCAAG AGAAGGTGGT TATACAAATG CCTTCCCTGA 
25 1001 AAAT6GCTTC TTCAAAGGAA ACAAGAATCA GTTCCCTGGA AAAGGATATG 
1051 CATTAT6CAA AAATATCCAG CATGGAGAAT AAAGCCA AGA GCCACCTCCA 
1101 CATGGAAGCC CCCTTTGAGA CAAA6GTCTC TACCCACCTG AGGACTGTCA 
1151 TCCCCATTGT CAATCACTCC TCCAAGATGA CAATTCAGAA GGCCATCAAG 
1201 CAAAGCTTCT CCTTAGGATA CATCCACTTG GCCTTGTGTG CTGATGCCTG 
30 1251 TGCAGGGAAC CCTTTCCGGG ACCACCTGAA GAAGCTG AAT TTGAAAGTGG 
1301 AGATCCAACT TCTTGACCTC TGGCAGGACT TGCAGCATTT CCTCAGTGTC 
1351 CTTCTGAATA ACAAAAAGAA TGGGAATGCA ATCTTTCGTC ACTTGCTGGG 
1401 TGACAGAATC TGCGA6CTCT ACCTGAATGA GCAGATTGGT CCGTGCTTAC 
1451 CACTCAAATC CCAAACCATT CAGGGCCTGA AGGAACTATT GCCCTCTGGG 
35 1501 GATGTGATCC CCTGGATTCC CAAAGCCCAG AAGGAGATTT GCAAGATGCT 
1551 CAGTCCCTG6 TATGATGAGT TTCTAGATGA AGAGGACTAC TGGTTTCTCC 
IbOl TTTTTACG6T AGGAAGGACT TTGGGTTAGG AAGGAATCAT GAGGATGAGG 
lb51 6AAGAAGAAA GAGTAATTAC TGTTTTAAAA GGGTTATGTG TTAAAGTAAA 
1701 TGAAATTGTT ATTTTTCCTA GAGTCAACCA AAGATCAGCA TGGTCCCTGT 
40 1751 TGTTCTAAAG CTAAACCTCT CAAGGAAAAG GACTCAGTGC ATAAGATGAC 
1301 TTTGGTGAAA CCCCGTCTCT ACTAAAAATA CAAAAAATTA GCCGGGCGTA 
1551 GTGGCGGGCG CCTGTAGTCC CAGCTACTTG GGAGGCTGAG GCAGGAGAAT 
1101 GGTGTGAACC CGGGAGGCGG AGCTTGCAGT GAGCC6AGAT CCCGCCACTG 
1151 CACGCCAGCC TGGGCGACAG AGCGAGACTC CGTCTCAAAA AAAAAAAAAA 
45 2001 AAAAAAAAAA G 



BLAST Results 



50 

No BLAST result 



Medline entries 

55 

No Medline entry 
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WO 01/98454 



PCT/IBOl/02050 



Peptide information for frame 1 

5 

ORF from ID bp to IbEb bp=. peptide length: 53^ 
Category: putative protein 
Classification: no clue 

10 1 MSSAEIIGST NLIILLEDEV FADFFNTFLS LPVFGtfTPFY TVENSdUSLU 

51 PEIPCNLIAK YKGLLTULEK CRLPFFCKTN LCFHYILCdE FISFIKSPEG 

1D1 AKfinRliJKKAD <3bJLL(2KCIGG VRGNURFYSY LTGSAGEELV DFbllLAENIL 

151 SIDEMDLEVR DYYLSLLLNL RATHLfiEGSR VVTLCNMNIK SLLNLSIWHP 

SOI NflSTTRREIL SHNflKVALFK LflSYULPNFY THTKIUMAKE EACHGLIK2EY 

15 251 ETRLYSVCYT HIGGLPLNMS IKKCHHF<3KR YSSRKAKRKI1 UtfLVDPDSWS 

301 LEMDLKPDAI GIIPLGETCPO EKVVIGMPSL KNASSKETRI SSLEKDMHYA 

351 KISSNENKAK SHLHMEAPFE TKVSTHLRTV IPIVNHSSKM TIt2KAIKt2SF 

M01 SLGYIHLALC ADACAGNPFR DHLKKLNLKV EIGLLDLUtJD LflHFLSVLLN 

M51 NKKNGNAIFR HLLGDRICEL YLNE(3IGPCL PLKS<2TI(2GL KELLPSGDVI 

20 501 PWIPKAdKEI CKMLSPUYDE FLDEEDYUFL LFTVGRTLG 



25 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtesS^elL frame 1 
30- No Alert BLASTP hits found 

Pedant information for DKFZphtes3_ c lelb-i frame 1 

•35 Report for DKFZphtes3_<telb . 1 

ELENGTHJ 5MB 
• EMUJ bE^Ob-Db 
40 EpIJ A- 35 

EKU3 Alpha_Beta 

SE(3 HGN[1SSAEIIGSTNLIILLEDEVFADFFNTFLSLPVFG(3TPFYTVENS(2li)SLI)JPEIPCNL 
45 PRD cccccceeeeccccceeehhhhhhhhhccccccccccccccccccccccccccccccchh 

SE<2 IAKYKGLLTULEKCRLPFFCKTNLCFHYILC(3EFISFIICSPEGAKnHRlJKKAD(2WLLc3KC 
PRD hhhhccceeecccccccccccccceeehhhhhhhhhhhccccchhhhhhhcchhhhhhhh 

50 SE(3 IGGVRGnblRFYSYLTGSAGEELVPFUILAENILSIDEf1DLEVRI>YYLSLLLriLRATHL(3E 
PRD ccccccceeeeeecccccccchhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccc 

SEd GSRVVTLCNnNIKSLLNLSIWHPN(3STTRREILSHritfKVALFKLc3SYWLPNFYTHTKnTn 

PRD cceeeeecccchhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhccccccchhhhhhh 

55 

SE<3 AKEEACHGLH(3EYETRLYSVCYTHIGGLPLNI1SIKKCHHF(3KRYSSRKAKRKriliJ(3LVDPD 

PRD hhhhhhhhhhhhhhhhheeeeeeccccccccccccccccchhhhhhhhhhhhhheeeccc 
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WO 01/98454 PCT/1B01/02050 

SE(3 SUSLEnDLKPDAIGI1PL(3ETCP(3EKVVIi2l1PSLKnASSKETRISSLEK:i>MHYAKISSnEN 

PRD cccccccccccccccccccccccceeeeeccccccccccccccchhhhhccccchhhhhh 

SEt3 KAKSHLHMEAPFETKVSTHLRTVIPIVNHSSK[1TIc3KAIK(3SFSL6YIHLALCADACA6N 

5 PRD hhhhhhhcccccccccccccGeeeeeeccccchhhhhhhhhhcccccccchhhhhhcccc 

SEC2 PFRDHLKKLNLKVEI(3LLDLWt3DLr3HFLSVLLNNKKNGNAIFRHLLGPRICELYLNEi3IG 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccceeeecccchhh 

10 SE<2 PCLPLKS(3TI(3GLKELLPSGDVIPIJIPKA(3KEICKI1LSPUYDEFLDEEDYUFLLFTVGRT 

PRD ccccccchhhhhhhhccccccceeecccchhhhhhhcccchhhhhccccceeeccccccc 



15 



20 



30 



45 



SE(2 LG 
PRD cc 



(No Prosite data available for DKFZphtes3_Telb.l) 
(No Pfam data available for DKFZphtes3_Telb - 1 > 



The PROSITE is a database of protein families and domains- 
It consists of biologically significant sitesi patterns and 
profiles that help to reliably identify to which known protein 
25 family (if any) a new sequence belongs- World Wide Web URL 

http://www.expasy.ch/prosite/ is the entry point to the database. 
A description of the prosite consensus patterns follows- 



NAME: N-glycosylation site- 

CONSENSUS: N--CP>-CST3--CP> - 



NAME: Glycosaminoglycan attachment site- 

35 CONSENSUS: S-G-x-G- 

NAME: Tyrosine sulfation site- 

NAME: cAMP- and cGMP-dependent protein kinase 

40 phosphorylation site. 

CONSENSUS : ERKJ ( S ) -x-EST 1 - 



NAME: Protein kinase C phosphorylation site- 
CONSENSUS: EST J-x-ERO - 

NAME: Casein kinase II phosphorylation site- 
CONSENSUS: EST3-X ( 2 ) -EDE3) - 



NAME: Tyrosine kinase phosphorylation site- 

50 CONSENSUS: ERK3-x(B-,3)-EDE3«x(2-.3)-Y. 

NAME: N-myristoylation site- 

CONSENSUS: G--CEDRKHPFYUJ-X ( 2 ) -ESTAGCN3-CP> - 

55 NAME: Amidation site. 

CONSENSUS: x-G-ERKl-CRO • 

NAME: Aspartic acid and asparagine hydroxy lat ion site 
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CONSENSUS: C-x-EDNm-x < M ) - EF YD- x-C-x-C - 

NAME: Vitamin K-dependent carboxy lation domain- 

CONSENSUS: x ( 12 ) -E-x ( 3 ) -E-x-C-x ( fc> ) -OENiD-x-ELIVMFYID-x ( ^ ) - 

5 EFYhO- 

NAME: Phosphopantetheine attachment site- 

CONSENSUS: OEflGSTALMKRH J-IELIVMFYSTACID-EGNtJID-IELIVMF YAGJ- 

CDNEKHSl-S-IELIVriST3- 
10 CONSENSUS : {PCFY>-CST AGCPtfLIVMFJ-ELI VMATNJ-EDENtfGTAKRHLMID- 

ELIVMUSTAJ-QILIVGSTACRID- 
CONSENSUS: x (B) -ELIVMFA 1 - 

NAME: Acyl carrier protein phosphopantetheine domain 

15 profile. 

NAME: Prokaryotic membrane lipoprotein lipid attachment 

site • 

CONSENSUS: {DERK> Cfc, ) -ELIVriFUSTAG 1 ( 2 ) -ELIVMFYSTAGCfllD-lLAGSID-C . 



20 



30 



45 



NAME: Prokaryotic N-terminal methylation site- 

CONSENSUS: EKRHEtfSTAG J-G-EFYLIVMJ-ESTJ-ELTID-IELIVP J-E- 

ELIVMFWSTAGJCm) . 



25 NAME: Prenyl group binding site (CAAX box) 

CONSENSUS: c-{ DENfl}-QILIVM3-x> - 



NAME: Protein splicing signature- 

CONSENSUS: E3>NEGJ-x-ELIVFA3-ELIVMY]l-ELVAST J-H-N-ESTCID • 

NAME: Endoplasmic reticulum targeting sequence- 
CONSENSUS: EKRH(2SA3-IEDEN(33-E-L> - 



NAME: Microbodies C-terminal targeting signal- 

35 CONSENSUS: ESTAG.CNJ-ERKH J-ELIVMAFY:J)> - 

NAME: Gram-positive cocci surface proteins 'anchoring 1 

hexapeptide- 

CONSENSUS: L-P-x-T-G-tSTGAVDE 1 - 

40 

NAME: Bipartite nuclear targeting sequence- 

NAME: Cell attachment sequence- 

CONSENSUS : R-G-D. 

NAME : ATP/GTP-binding site motif A (P-loop) - 

CONSENSUS: . lEAGIB-x ( H ) -G-K-ESTJ - 

NAME : Cyclic nucleot ide-binding domain signature 1 . 

50 CONSENSUS: ELIVM3-EVIO-X (E)-G-EDEN(2TAll-x-IEGAC3-x(2)- 

ELIVMFYJ(M)-x(H)-G. 

NAME : Cyclic nucleot ide-binding domain signature 2 . 

CONSENSUS : ELIVMFJ-G-E-x-EGASJ-ELIVMJ-x (5,11) -R-CST AtO-A-x- 

55 ELIVMAID-x-ESTACVID. 

NAME : cAMP/cGMP binding motif. 
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NAME: EF-hand calcium-binding domain- 

CONSENSUS: D-x-ONSJ-{ILVFYU>-OENSTG3-ON(2GHRK3-{GP>- 

lELIVMO-EDENtfSTAGCJ-xCB)- 

CONSENSUS: EDEHl-OILIVMF YUH - 

5 

NAME: Actinin-type actin-binding domain signature 1- 

CONSENSUS: EE(2J-x ( B ) -EATVJ-IEFYID-x ( B ) -U-x-N - 

NAME: Actinin-type actin-binding domain signature B . 

10 CONSENSUS: CLI VM3-X-CSGN3I-CLI Vn3-CI> AGHEH-CSAGl-x-lEDNEAGll- 

CLIVM3-x-CDEAGni-x(M)- 

CONSENSUS: CLIVn3-x-CLI13-CSAG3-CLIVnil-ELIVMTll-ljJ-x-CLIVri3 (B ) - 

NAME: Anaphylatoxin domain signature- 

15 CONSENSUS: ECSHJ-C-x (B ) -EGAPID-x (? fi ) -EGASTDEdRID-C-EGASTDEflLl- 

x(3-, c 1>-n:GASTDE(3Nni-x(B)- 
CONSENSUS: ECE3-X < b 7) -C-C - 

NAME: Anaphylatoxin domain profile- 

20 

NAME: Apple domain- 

CONSENSUS : C-x (3> -ELIVMFYID-x ( 5) -CLIVMFYD-x ( 3 ) -C3>EN(31- 

CLIVMFY3-x(lQ)-C-x(3)-C-T- 

CONSENSUS: x ( M ) -C-x-ELIVMF Y3-F-X-EFYJ-X ( 13 -»m ) -C-x-ELIVMFYD- 

25 ERO-x-ICST:D-x(14-.15)- 

CONSENSUS: S-G-X-EST3-ELIVMFY31-X ( B ) -C - 

NAME: Band 4-1 family domain signature 1- 

CONSENSUS: lil-CLIVJ-x (3) -EKRO-x-CLIVMJ-x (B) -Ct3HH-x CD -,B >- 

30 ELIVMFJ-xC b-.fi) -ELI VMFJ- 

CONSENSUS: x (3-,5> -F-EFYJ-x (B)-CDENSJ - 

NAME: Band M-l family domain signature B- 

CONSENSUS: EHYWJ-x ) -EDENflSTVJ-ESAJ-x ( 3 > -EFY J-ELIVMJ-x ( B ) - 

35 CACV3-x(B)-ELM3-x(B)- 

CONSENSUS : EF Y J-G-x-EDEN<3ST J-ELIVMF YS 1 - 

NAME : Band 4-1 family domain profile- 

40 NAME: Clq domain signature. 

CONSENSUS: F-x ( 5 ) -ENDJ-x ( 4 ) -EFYULJ-x ( b) -F-x (5 ) -G-x-Y-x-F-x- 

EFY2- 



NAME : C-terminal cystine knot signature- 
45 CONSENSUS: C-C-x < 13) -C-x < B ) -EGNIB-x < IB ) -C-x-C-x C B 4 ) -C - 

NAME: C-terminal cystine knot profile- 

NAME: CUB domain profile- 

50 

NAME: Death domain profile- 

NAME: EGF-like domain signature 1- 
CONSENSUS: C-x-C-x (5 ) -G-x (S ) -C - 

55 

NAME: EGF-like domain signature 3- 
CONSENSUS: C-x-C-x ( B ) -EGPJ-EFYUIID-x ( 4 i fi ) -C - 
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NAME: Calcium-binding EGF-like domain pattern signature. 

CONSENSUS: EDE<3N3-x-OEi2NJ(B)-C-x(3ilM)-C-x(3t7)-C-x-ONni- 

x(h>-o:fy:d-x-c. 

5 NAME: Laminin-type EGF-like (LE) domain signature- 

CONSENSUS: C-x ( 1 -.2 ) -C-x ( 5 ) -G-x (2 ) -C-x (S ) -C-x ( 3 M ) -EFYIiO- 

x(3-»15>-0 

NAME: Coagulation factors S/fl type C domain (FASflC) 

10 signature 1- 

CONSENSUS: EGASH-U-x (7«il5)-EFYbD-ELIV3-x-inLIVFA3l-fl!GSTDEN31- 

x(b)-CLIVF3-x(2)-CIVJ-x- 

CONSENSUS: ELIVTH-EflKrU-G - 

15 NAME: Coagulation factors S/fl type C domain (FASflC) . 

signature 2* 

CONSENSUS: P-x ( fi n 10 ) -ELrO-R-x-IIIGEIB-IELIVPIB-x-G-C: - 



20 



NAHE: Forkhead-associated (FHA) domain profile- 

NAME: Fibrinogen beta and gamma chains C-terminal domain 
signature. 

CONSENSUS: U-bl-CLIVMFYUl-x (2)-C-x(2)-!EGSAJ-x(2)-N-G- 

25 NAME: Type I fibronectin domain. 

CONSENSUS: C-x ( t -» fi ) -ELFYJ-x ( 5 ) -CFYlO-x-ERK J-x C fi -, 10 ) -C-x-C- 

xCfa-i^D-C- 

NAME: Type II fibronectin collagen-binding domain- 
30 CONSENSUS: C-x(2)-P-F-x-EFYLJIl-x(7>-C-x(fiTlD>-U-C-x( l »)- 
ICI>NSR3-D[FYUJ-x<3,S)-CFYU3-x- 
CONSENSUS: {LFYUIJ-C - 

NAME: Hemopexin domain signature- 
35 CONSENSUS: ELIFAIJ^x ( 3 ) -U-x ( 2 -.3) -CPE3-X < 2 ) -CLIVI1FY J-CDEN(3S3 

CSTA3-ICAV3-!ELIVnFY3- 



40 



NANE: Kringle domain signature. 

CONSENSUS: ICFY3-C-R-N-P-CDNR3 . 

NAME: Kringle domain profile- 



NAME: LDL-receptor class A (LDLRA) domain signature- 

CONSENSUS: C-EVILMA3-x(5)-C-ONH3-x(3)-OENi2HT3-C-x(3i4)- 
45 ESTADEJ-EDEHID-lEDEJ-xdvS)- 
CONSENSUS: C. 

NAME: LDL-receptor class A (LDLRA) domain profile. 

50 NAME: Otype lectin domain signature- 

CONSENSUS: C-ELIVMFYATGID-x ( 5 12 ) -EWLID-x-ONSRJ-x (2) -C-x (Sit. 

CFYULIVSTA3-CLIVI1STA3- 

CONSENSUS: C- 

55 NAME: C-type lectin domain profile- 

NAUE: Link domain signature- 

CONSENSUS: C-x ( IS ) -A-x ( 3-, H ) -G-x ( 3) -C-x ( 2 ) -G-x ( fl ,1 ) -P- x (7 ) -C 
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NAME: Osteonectin domain signature 1- 

CONSENSUS: C-x-EDN3-x ( E ) -C-x (B ) -G-EKRH3-X-C-X ( b 7) -P-x-C-x-C- 

x(3-.5)-C-P- 

5 

NAME: Osteonectin domain signature B- 

CONSENSUS: F-P-x-R-EIPU-x-D-W-L-x-IENCO- 

NAME: Somatomedin B domain signature. 

10 CONSENSUS: C-x-C-x ( 3) -C-x ( S ) -C-C-X-EDN3-EF Y3-x ( 3 ) -C - 

NAME: Thyroglobulin type-1 repeat signature- 

CONSENSUS: EFYb)HP3-x-P-x-C-x (3-, M ) -G-x-EFYIO-x (3) -(2-C-x ( M -,1D ) - 

C-EFYW3-C-V-x(3«iM>- 
15 CONSENSUS: ESG3 - 

NAME : P-type 'Trefoil' domain signature- 

CONSENSUS : R-x ( B ) -C-X-EFYPST3-X ( 3 M ) -EST3-X (3 ) -C-x ( M ) -C-C- 

EFYUH3- 

20 

NAME: Cellulose-binding domain-i bacterial type- 

CONSENSUS : U-N-ESTAGR3-ESTDN3-ELIVM J-x (B > -EGST3-X-EGST3-X (E ) - 

ELIVMFT3-EGA3 • 

25 NAME : Cellulose-binding domains fungal type- 

CONSENSUS: C-G-G-x ( H -,7 ) -G-x ( 3 ) -C-x ( 5 ) -C-x ( 3 5 ) -ENHG3-X- 

EFYldM3-x(B)-G-C- 

NAME: Chitin recognition or binding domain signature- 

30 CONSENSUS: C-x ( M •, 5 ) -C-C-S-x ( B ) -G-x-C-G-x ( M ) -EFYW3-C . 

NAME: Barwin domain signature 1- 

CONSENSUS: C-G-EKR3-C-L-x-V-x-N - 

35 NAME: Barwin domain signature B- 

CONSENSUS : V-EDN3-Y-EEC23-F- V-EDN3-C - 

NAME : BIR repeat- 

CONSENSUS: EHKEPILVY3-X ( B ) -R-x ( 3 -»7 ) -EFYLO-x (11 -.1M ) -ESTAN3-G- 

40 ELMF3-X-EFYHDA3-X (M ) - 

CONSENSUS : EDESL3-X (B-. 3 ) -C-X (B ) -C-X (b ) -EUA3-X ( 1 ) -H-X C «4 ) - 

EPRSD3-X-C-XCB)-ELIVMA3- 

NAME: UAP-type 9 f our-disulf ide core 1 domain signature- 

45 CONSENSUS: C-X--CO-EDN3-X ( B) - C-x (5) -C-C - 

NAME: Phorbol esters / diacylglycerol binding domain- 

C0NSENSUS: H-x-ELIVMFYU3-x (&ill)-C-x(B)-C-x(3) -ELIVMFC3- 

x(5-,10)-C-x(B)-C-x(M)-EHD3- 
50 CONSENSUS: x (E) -C-x CS-iT) -C • 

NAME: CB domain signature- 

CONSENSUS: EACG3-X CB ) -L-x ( B -, 3 ) -D-x ( 1-, B ) -ENGSTLIF3-EGTMR3-X- 

ESTAP3-D-EPA3-EFY3 - 



55 



NAME: CB-domain profile- 

NAME: CAP-Gly domain signature 
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CONSENSUS: G-x(fl-ilD) -EFYIiO-x-G-ELIVM3-x-ELIVMFYJ-x (M)-G-K- 

ENH3-x-G-IESTAR:B-xC2)-G- 

CONSENSUS: x (5)-ICLY3-F . 

5 NAME: Ly-b / u-PAR domain signature. 

CONSENSUS ■ CEt2R3-C-CLIVI1FYAHJ-x-C-x(S-.fl)-C-x(3-,6)-lLE])N(2STV]l- 

C—CO-x (5)-C- 

CONSENSUS: x(12-.24)-C. 

10 NAME: MAM domain signature- 

CONSENSUS: G-X-ILIVI1FY3 ( 2 ) -x ( 3) -ESTA3-X (10-, 11 ) -ELVID-x ( 4 ) - 

ELIVMFJ-x ( b -. 7 ) -c-CLivro-x- 

CONSENSUS: F-x-ELIVMFYJ-x ( 3 ) -EGSC3 - 

15 NAME: NAM domain profile. 

NAME: PH domain profile- 

NAME: Phosphotyrosine interaction domain (PID) profile- 

NAME: Src homology 2 (SH2) domain profile- 

NAMEs Src homology. 3 (SH3) domain profile- 

25 NAME: VUFC domain signature - 

CONSENSUS: C-x (2-.3 ) -C-x-C-x ( b -.14 ) -C-x (3-. 4 ) -C-x (2-.1D) -C- 

x(Tilb)-C-C-x(2-.M)-C. 

NAME: UU/rsp5/li)li)P domain signature- 

30 CONSENSUS: U-x (=5-.!! > -EVFY3-EFYIO-X ( b-. 7) -EGSTNE3-IGST(2CR3- 

EFYlO-x(2)-P. 

NAME: UU/rspS/lillilP domain profile- 

35 NAME: ZP domain signature- 

CONSENSUS: ELIVMFYW3-X C 7 ) -ESTAPDNLJ-x OJ-ELIVMFYliD-x- 

ELIVMFYU3-x-ELIVMFYU3-x(2)-C- 

CONSENSUS: ELIVMFYU3-x-CST3-EPSL3-x (2 -.4 ) -EDENS3-X-ESTADNC3LF3- 

x C b > -ELIVMI < 2 > -x < 3-.M > - 
40 CONSENSUS: C- 

NAME: S-layer homology domain signature- 

CONSENSUS: ELVFYT3-x-EDA3-x (2, S ) -EDNGSATPHY3-EWYFPDA3-X ( 4 ) - 

ELIV3-x(2>-EGTALV3- 
45 CONSENSUS: x <4 ,b ) -CLIVFYC3-X (2) -G-X-EPGSTA3-X C2-.3) -EMFYAJ-x- 

EPGAV3-x(3-.10)-ELIVMA3- 

CONSENSUS : ESTKR3-ERY3-x-EE(J3-x-ESTALIVM3 - 

NAME: 'Homeobox' domain signature- 

50 CONSENSUS: ELIVMFYG3-EASLVR3-X ( 2 ) -ELIVMSTACN3-X-ELIVM3-X ( 4 ) - 

ELIV3-ERKNflESTAIY3- 

CONSENSUS: ELIVFSTNKH3-U-EFYVC3-x-ENI>C}TAH3-x (S > -ERKNAIMU3 - 



55 



NAME: 'Homeobox' domain profile. 

NAME: 'Homeobox' antennapedia- type protein signature- 

CONSENSUS: ELIVMFE3-EFY3-P-W-M-EKR<3TA3 - 
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NAME: 'Homeobox' engrai led-type protein signature. 
CONSENSUS: L-M-A-fi-G-L-Y-N- 

NAME: 'Paired box 1 domain signature- 
5 CONSENSUS: R-P-C-x ( 11 ) -C-V-S . 

NAflE : r POU r domain signature 1 - 

CONSENSUS: CRK(3]l-R-D:Liril~x-CLF3-G-CLIVriPY]l-x-(2-x-CDN(33I-V-G . 

10 NAflE: 'P0U r domain signature 2. 

CONSENSUS: S-(2-CST3-D[TAni-I-D:SC3-R-F-E-x-ELS(33-x-CLI3-D:ST J . 



15 



35 



NAME: Zinc finger-i C2H2 typei domain- 

CONSENSUS : C-x ( 2 -. H ) -C-x ( 3 ) -ELIVMFYLJO-x ( a ) -H-x ( 3 1 5 ) -H • 

NAME: Zinc finger-i C3HCM type (RING finger)-, signature* 
CONSENSUS: C-x-H-x-ELIVMF Y3-C-X ( 2 ) -C-ELIVMYAl . 



NAME: Nuclear hormones receptors DNA-binding region 
20 signature- 

CONSENSUS : C-x (2)-C-x-OEIl-x (5) -EHN J-EFYJ-x ( M ) -C-x (2 ) -C-x (2 ) - 

F-F-x-R- 

NAME : GATA-type zinc finger domain- 
25 CONSENSUS: C-x-EDNJ-C-x ( M -. 5 ) -ESTH-x ( 2 ) -U-EHRJ-ERO-x ( 3) -EGNJ- 

x(3^M)-C-N-CAS3-C- 

NAriE: Poly ( ADP-ribose) polymerase zinc finger domain 
signature- 

30 CONSENSUS: C-EKRJ-x-C-x (3) -I-x-K-x (3) -ERGD-x (lb-,lfi )-lil-CFYH3- 

H-x(2)-C- 



NAME: Poly (ADP-ribose) polymerase zinc finger domain 

prof ile - 



NAME : Fungal Zn(2)-Cys(b) binuclear cluster domain 

signature- 

CONSENSUS: EGASTPVJ-C-x ( 2 ) -C-ERKHSTACliO-x ( 2 ) -ERKHfl 3-x ( 2 ) -C- 

x(5-,12)-C-x(2)-C-x(b-.a)- 
40 CONSENSUS: C- 

NAME: Fungal Zn(2)-Cys (b) binuclear cluster domain profile. 

NAME: Prokaryotic dksA/traR CM-type zinc finger. 

45 CONSENSUS: C-EDES J-x-C-x ( 3) -I-x ( 3 ) -R-x ( H) -P-x ( *!) -C-x(2)-C- 

NAME: Copper-fist domain signature- 

C0NSENSUS: M-ELIVMFI ( 3 ) -x ( 3 ) -K-EMYJ-A-C-x ( 2 ) -C-I-EKRID-x-H- 

EKRJ-x(3)-C-x-H-x(fl)- 
50 CONSENSUS: EKRH-x-EKRJ-G-R-P - 

NAME : Copper fist DNA binding domain profile • 

NAME: Leucine zipper pattern- 

55 CONSENSUS: L-x (b ) -L-x ( b ) -L-x (b )-L - 

NAME : bZIP transcription factors basic domain signature- 
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CONSENSUS: CKR J-x ( 1, 3 ) -ERKSAfl J-N-x ( 2 ) -ESAtO (5 ) -x-ERKTAENflS-x- 

R-x-ERO- 

NAME: flyb DNA-binding domain repeat signature 1- 
5 CONSENSUS: * U-EST3-X < 2 > -E-EDE3-X (2 ) -ELIV3 - 

NAME: riyb DNA-binding domain repeat signature 2- 

CONSENSUS: u-x (2 ) -ELI3-ESAG3-X ( M -, S) -R-x ( fl) -EYLO-x ( 3 ) -ELIVM3 - 

10 NAME: Myc-typen 'helix-loop-helix' dimerization domain 
signature- 

CONSENSUS: EDENSTAP3-K-ELIVMiiJAGSN3-{FYUCPHKR}-ELIVT3-ELIV3- 
x(2>-ESTAV3-ELIVMSTAC3-x- 

C0NSENSUS: EVMFYH3-ELIVMTA3--CP:)— CP>-ELIVMSR3 - 



15 



35 



45 



50 



55 



NAME: p53 tumor antigen signature- 

CONSENSUS : M-C-N-S-S-C-M-G-G-M-N-R-R - 



NAME : CBF-A/NF-YB subunit signature- 
20 CONSENSUS : C-V-S-E-x-I-S-F-ELIVM3-T-ESG3-E-A-ESC3-EDE3-EKR(23- 
C 

NAME: CBF— B/NF-YA subunit signature. 

CONSENSUS : Y-V-N-A-K-<2-Y-x-R-I-L-K-R-R-x-A-R-A-K-L-E - 

25 

NAME: 'Cold-shock' DNA-binding domain signature- 
CONSENSUS: EFY3-G-F-I-x(b-.7) -EDER3-ELIVM3-F-x-H-x-ESTKR3-x- 

ELIVMFY3- 

30 NAME: CTF/NF-I signature. 

CONSENSUS: R-K-R-K-Y-F-K-X-H-E-K-R - 

NAME: Ets-domain signature 1- 

CONSENSUS: L-EFYli)3-E(2EDH3-F-ELI3-ELV<3K3-x-ELI3-L - 

NAME: Ets-domain signature 2- 

C0NSENSUS: ERKH3-X (2 ) -M-x-Y-EDEN(23-x-ELIVM3-ESTAG3-R-ESTAG3- 



ELI3-R-X-Y- 
40 NAME: Ets-domain profile- 



NAME: Fork head domain signature 1- 

CONSENSUS: EKR3-P-EPTC3-EFYLV(2H3-S-EFY3-x C 2 ) -ELIVM3-X (3-. M > - 

EAC3-ELIM3 • 

NAME: Fork head domain signature 2. 

CONSENSUS : lil-EflKR3-ENS3-S-ELIV3-R-H . 

NAME: Fork head domain profile. 

NAME: HSF-type DNA-binding domain signature- 

CONSENSUS: L-x (3 > -EFY3-K-H-x-N-x-ESTAN3-S-F-ELIVM3-R-(2-L- 

ENH3-X-Y-X-EFYU3-ERKH3-K- 

CONSENSUS: ELIVM3- 

NAME: Tryptophan pentad repeat (IRF family) signature. 

CONSENSUS: ld-x-EDNH3-x ( 5 ) -ELIVF3-x-EIV3-P-ld-x-H-x ( T , ID ) -EDE3- 

x(2)-ELIVF3-F-EKR(33-x- 
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CONSENSUS: EUR3-A - 

NAMES LIU domain signature. 

CONSENSUS: C-x ( 2 ) -C-x (15-. SI ) -EFYUHl-H-x ( 2 ) -ECH3-X ( 2 ) -C-x ( E ) - 

5 C-x(3)-ELIVMF3- 

NAME: LIU domain profile- 

NAME: NF-kappa-B/Rel/dorsal domain signature. 

10 CONSENSUS: F-R- Y-x-C-E-G ■ 

NAME: MADS-box domain signature- 

CONSENSUS: R-x-ERK3-x (S>-I-x-EI>N3-xC3)-IKR3-x<2)-T-EFY3-x- 

ERK3(3)-x(2>-ELIVM3-x- 
15 CONSENSUS: K (2)-A-x-E-ELIVM3-EST3-x-L-x CM ) -ELIVM3-X- 

ELIVM3C3)-x(fe.)-ELIVMF3-x<2)- 
CONSENSUS: EFY3- 



20 



NAME: tIADS-box domain profile- 

NAME: T-box domain signature 1- 

• CONSENSUS: L-W-x (2 ) -EFO-x ( 3-. M >-ENT3-E-M-ELIV3 (E ) -T-x (2 ) -G- 

ERG3-EKR<23- 

25 NAME: T-box domain signature 2- 

CONSENSUS: ELIVMYU3-H-EPADH3-EDEN3-EGS3-X ( 3) -G-x (2 ) -W-M-x ( 3) - 

EIVA3-X-F. 

NAME: TEA domain signature. 

30 CONSENSUS: G-R-N-E-L-I-x < 2) -Y-I-x < 3) -ETC3-X < 3 ) -R-T-ERK3 <2 > -C2- 

ELIVM3-S-S-H-ELIVM3- 
CONSENSUS: (3-V- 

NAME: Transcription factor TFIIB repeat signature- 

35 CONSENSUS: G-EKR3-X (3 >-ESTAGN3-x-ELIVMYA3-rEGSTA3 (2 > -ECSAV3- 

ELIVM3-ELIVMFY3-ILIVMA3- 
CONSENSUS: EGSA3-ESTAC3 - 

NAME: Transcription factor TFIID repeat signature- 

40 CONSENSUS: Y-x-P-x (2 ) -EIF3-X (2 > -ELIVM3 (2) -x-IKRH3-x (3) -P- 

ERK<33-x(3>-L-ELIVM3-F-x- 

CONSENSUS: ESTN3-G-EKR3-ELIVM3-X (3) -G-ETAGL3-EKR3-X (7 ) -EAGC3- 

x(7)-ELIVI13. 

45 NAME: TFIIS zinc ribbon domain signature- 

CONSENSUS: C-x (2 ) -C-x ) -ELIVMflSAR3-E(JH3-EST(2L3-ERA3-ESACR3- 

X-EDE3-EDET3-EPGSEA3- 

CONSENSUS: x (h ) -C-x (E -.5 ) -C-x (3 ) -EFU3 - 

50 NAME: TSC-22 / dip / bun family signature- 

CONSENSUS: M-D-L-V-K-x-H-L-x (2 ) -A-V-R-E-E-V-E - 

NAME: Prokaryotic transcription elongation factors signature 

1. 

55 CONSENSUS: EST3-X ( 2 > -EGS3-X C3 ) -ELI3-X (E >-E-L-x (E ) -L-x (3-, -R- 

x(E)-EIV3-x(3)-ELIV3- 

CONSENSUS: x Cb)-G-D-x C2)-E-N-EGSA3-x-Y- 
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NAME: Prokaryotic transcription elongation factors signature 
2- 

CONSENSUS: S-x C2 ) -S-P-CLIVMID-EAGJ-x-ESAG J-ELIVMll-ELIVMYID- 

xCLO-OGJ-OEJ. 

5 

NAME: DEAD-box subfamily ATP-dependent helicases signature- 
CONSENSUS: ELIVMF3 (2 ) -D-E-A-D-ERKEND-x-CLIVMFYGSTN]! - 

NAME: DEAH-box subfamily ATP-dependent helicases signature- 
10 CONSENSUS : EGSAHID-x-ELIVMFl (3>-I>-E-EALIV:II-H-ENECR:B - 

NAME: Eukaryotic putative RNA-binding region RNP-1 
signature- 

CONSENSUS: ERO-G — CEDRKHPCGJ-EAGSCI J-EFYJ-ELI VA J-x-EFYLMl . 



15 



40 



NAME: Fibrillarin signature- 

CONSENSUS: EGSTJ-ELIVMAP3I-V- Y-A-EI V3-E-EF Y2-CSA3-X-R-X (2 ) -R- 

EDE3. 



20 NAME: MCM family signature- 

CONSENSUS: G-EIVTJ-ELVACU (2) -EIVTJ-D-EDEJ-EFLJ-EDNST:!. 

NAME: MCM family domain* 

25 NAME: XPA protein signature 1- 

CONSENSUS: C-x-OEJ-C-x < 3 ) -ELIVMF3-X (l-.2)-D-x (2)-L-x(3)-F- 

x(4)-C-x(2)-0 

NAME : XPA protein signature 2- 
30 CONSENSUS: ELI VM 1 ( 2 ) -T-EKR J-T-E-x-K-x-EDEJ-Y-ELI VMF 1 ( 2 > -x-D- 

x-EDEJ- 



NAME: XPG protein signature 1- 

CONSENSUS: EVO-EKREID-P-x-EFYILID-V-F-D-G-x (2 ) -CPIL J-x-ELVO- 



35 K« 



NAME: XPG protein signature 2- 

CONSENSUS : EGSII-ELIVMll-EPERni-EF YS3-ELI VMH-x- A-P-x-E- A-EDE3- 

EPASJ-EflSH-ECLMlN 



NAME: Bacterial regulatory proteins-i araC family signature. 

CONSENSUS: EKRCO-ELIVMA J-x ( 2 ) -EGSTALISO--CF YUPGDN>-x ( 2 ) - 

ELIVMSAJ-xCM^-ELIVMFJ- 

C0NSENSUS: x (2) -ELIVMSTAID-EGST ACILID-x ( 3) -EGANflRFJ-ELIVMFYl- 

45 x(HnS)-ELFY3-x(3)- 

C0NSENSUS: EFYIVA3-{FYWHCM}-x(3)-EGSADEN(3KRl-x-ENSTAPKLJ- 
EPARL3 - 

NAME: Bacterial regulatory proteins^ araC family DNA-binding 

50 domain profile- 

NAME: Bacterial regulatory proteins-. arsR family signature. 

CONSENSUS : C-x (2 ) -D-ELIVMJ-x ( t ) -ESTJ-x CM ) -S-EHYRJ-EHO - 

55 NAME: Bacterial regulatory proteins-i asnC family signature. 

CONSENSUS : EGSTAPJ-x (2 ) -ONEAH-ELIVMJ-EGSA J-x (2) -ELIVMFY J- 

EGN3-ELIVMST3-EST3I-x(b)-R- 
CONSENSUS: EL VTJ-x < 2) -ELIVMJ-x (3) -G - 
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NAME: Bacterial regulatory proteinsi crp family signature- 

CONSENSUS: ELIVI13-ESTAG3-ERHNU3-X ( 2 ) -CLIMl-EGAH-x-ELIVriF YA3- 

ELIVSC3-EGA3-X-ESTACN3- 
5 CONSENSUS: x<2) -EI1ST3-x-EGSTN3-R-x-ELIVriF3-x (S) -ELIVMF3 • 

NAME: Bacterial regulatory proteins-. deoR family signature. 

CONSENSUS : R-x ( 3) -ILIVM3-X ( 3) -ELI VMl-x (lb -.17) -CSTAJ-x ( 2 ) -T- 

ELIVMA3-ERH3-EKRNA3-]>- 
10 CONSENSUS : ELIVMF3 - 

NAME: Bacterial regulatory proteins-. gntR family signature. 

CONSENSUS: ELIVAPKR3-EPILV3-x-EEt3TIVf1R3-x ( 2 ) -ELIVI13-X < 3 ) - 

ELIVI1FYK3-X-ELIVFT3- 
15 CONSENSUS: EDNGSTK3-ERGTLVJ-x-ESTAIVP3-ELIVA3-x C2)-ESTAGV3- 

ELIVnFYH3-x(2)-ELI1A3. 

NAME: Bacterial regulatory proteins-. iclR family signature. 

CONSENSUS: EGA3-X (3) -EDS3-X (2 )-E-x <b >-ECSA3-ELIVM3-EGSA:i- 

20 x(2)-ELIVI13-EFYH3-E]>N3. 

NAME: Bacterial regulatory proteins-i lad family signature. 

CONSENSUS: ELIVM3-x-EDE3-ELIVri3-A-x (2 ) -ESTAGV3-X-V-EGSTP3- 

x(2)-ESTAG3-ELIVI1A3-x(2)- 
25 CONSENSUS: ELIVI1FYAN3-ELIVnC3 . 

NAME: Bacterial regulatory proteins-. luxR family signature. 

CONSENSUS : EGDC3-X (2 ) -ENSTAVY3-X (2) -EIV3-EGSTA2-X (2 > - 

ELIVflFYli)CT3-x-ELIVI1FY!i)CR3-x (3>- 
30 CONSENSUS: ENST3-ELIVM3-X (5) -ENRHSA3-ELIVMSTA3-X (2) -EKR3 • 

NAME: Bacterial regulatory proteins-. lysR family signature. 

CONSENSUS : EN(2KRHSTAG3-ELIVh*FYTA3-x (2 ) -ESTAGLV3-ESTAG3-X ( M ) - 

ELIVI1YCTflR3-EPSTANLVER3- 
35 CONSENSUS: x-EPSTAG<2V3-EPSTAGNVnT3-ELIVI1FA3-ESTAGH3-x (2 ) - 

ELIVHF3-x(2)-ELIVMFU3- 

CONSENSUS: ERKEAV3-X (2) -ELIVMFYNTAE J-x (3 ) -ELII1VT3 • 

NAME: Bacterial regulatory proteins-. marR family signature. 

40 CONSENSUS: ESTNA3-ELIA3-X-ERNGS3-X < 1 > -ELM3-EEIV3-X (2) -EGES3- 

ELFYIO-ELIVC3-x(7>- 

CONSENSUS: EDN3-ERKAG3-ERK3-X (b ) -T-x (2 ) -EGA3 ■ 

NAP1E : Bacterial regulatory proteins-. merR family signature. 

45 CONSENSUS: EGSA3-X-ELIVMFA3-EASM3-X ( 2 ) -ESTACLIV3-EGSDENC2R3- 

ELIVC3-ESTANHK3-x(3)- 

CONSENSUS: ELIVM3-ERHF3-x-EYU3-EDE(31-x (2 -.3 ) - EGHDNd3- 

ELIVMF3(2) . 

50 NAME: Bacterial regulatory proteins-. tetR family signature- 

CONSENSUS: G-ELIVMFYS3-X (2 -. 3 > -ETS3-ELI VMT3-X ( 2) -ELIVH3-X ( 5) - 

ELIV<2S3-ESTAGEN(2H3-x- 

CONSENSUS: EGPAR3-x-ELIVMF3-EFYST3-x-EHFY3-EFV3-x-EDNST3-K- 
x(2)-ELIVI13. 

55 

NAME: Transcriptional ant iterminators bglG family signature- 

CONSENSUS: EST3-x-H-x ( 2 ) -EF A3 ( 2 )-ELIV(13-EEC!K3-R-x (2 ) -E0NK3 - 
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NAME:: Sigma-5M factors family signature 1- 

CONSENSUS: P-ELIVM3-X-ELIVM3-X ( 2 ) -ELIVMIB-A-x (S ) -ILIVMF3-X C2 ) - 

EHS3-X-S-T-ELIVM3-S-R- 

5 NAME: Sigma-5M factors family signature 2- 

CONSENSUS : R-R-T-EIV3-EAT3-K-Y-R . 

NAME: Sigma-5M factors family profile- 

10 NAME: Sigma-70 factors family signature 1- 

CONSENSUS: EDE3-ELIVMF3 ( 2 ) -EHE(2S3-x-G-x-ELIVMFA 3-G-L- 

ELIVMFYEI-x-EGSAMJ-ILIVMAPS. 

NAME: Sigma-7D factors family signature 2- 

15 CONSENSUS: ESTN3-X ( 2 ) -EDE03-ELIVM3-EGAS3-X ( M ) -ELIVMF3-tEPSTG3- 

x(3)-ELIVnA3-x-ENflR3- 

CONSENSUS: ELIVI1A3-EE(2H3-x ( 3 ) -ELIVMFU3-X ( 2 ) -ELIVH3 • 

NAME: Sigma-7D factors ECF subfamily signature- 
20 CONSENSUS: EST AIV3-EPflI>EL3-EI>E3-ELIV3-ELIVTA3-t2-x-ESTAV3- 

ELIVMFYCI-ELIVMAO-x- 

CONSENSUS: EGSTAIVS-ELIMFYUflJ-x <12-,1M ) -ESTAP3-EFYU3-ELIF3- 

x(2)-EIV3- 

25 NAME: Sigma-5M interaction domain ATP-binding region A 
signature. 

CONSENSUS: ELIVMFY3(3) -x-G-EDE(33-ESTE3-G-ESTAV3-G-K-x ( 2) - 

ELIVMFY3- 

30 NAME: Sigma-SM interaction domain ATP-binding region B 
signature - 

CONSENSUS: EGS3-X-ELIVMF3-X ( 2 ) -A-EDNE(3ASH3-EGNEK3-G-ESTIM3- 

CLIVMFY3(3)-OE3-fl:EO- 

CONSENSUS: ELIVM3 • 

35 

NAME: Sigma-SM interaction domain C-terminal part signature- 
CONSENSUS: EFYliO-P-EGS3-N-ELIVM3-R-EE<0-L-x-ENHAT3 - 

NAME: Sigma-SM interaction domain profile- 

NAME: Single-strand binding protein family signature 1- 

CONSENSUS: ELIVMF3-ENST3-EKRT3-ELIVM3-X-ELIVMF3 ( 2) -G-ENHRO- 

ELIVM3-EGST3-X-EDET3. 

45 NAME: Single-strand binding protein family signature 2- 

CONSENSUS: T-X-UI-EHY3-ERNS3-ELIVM3-X-ELIVMF3-EFY3-ENGKR3 - 

NAME: Bacterial histone-1 ike DNA-binding proteins signature- 

CONSENSUS: EGSK3-F-X (2 ) -ELIVMF3-X (M ) -ERKE(2A3-x ( 2 ) -IRST3-X- 

50 EGA3-X-EKN3-P-X-T- 

NAME: Dps protein family signature 1- 

CONSENSUS: H-EFU3-x-ELIVM3-x-G-x (5) -ELV3-H-X (3 ) -EPE3 - 

55 NAME: Dps protein family signature 2- 

CONSENSUS: ELIVMFY3-EDH3-X-ELIVM3-EGA3-E-R-X (3 ) -ELIF3-EGDN J- 

x(2)-EPA3- 
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NAME: DNA repair protein radC family signature. 

CONSENSUS: H-N-H-P-S-G- 

NAME: recA signature- 

CONSENSUS : A-L-EKRH-EIFU-EFYID-ESTA J-EST ADJ-ELIVMfl J-R . 

NAME: RecF protein signature 1- 

CONSENSUS: P-EEIO-x ( 3 ) -ELIVMJ (2 ) -x-G-EGSADl-P-x ( 2 ) -R-R-x- 

EFY3-ELIVM3-D - 

NAME: RecF protein signature 2- 

CONSENSUS: ELIVMFY3 ( 2 ) -x-D-x (Ei 3 ) -ESAID-EEHH-L-D-x ( 2 ) -EKRHID- 

x(3)-L- 



15 NAME: RecR protein signature- 

CONSENSUS : C-x ( 2) -C-x < 3) -ESTJ-x < M )-C-x-I-C-x ( 4 > -R . 



NAME: Histone H2A signature. 

CONSENSUS: EAC D-G-L-x-F-P-V - 

NAME: Histone H2B signature- 

CONSENSUS: EKRH-E-ELIVMl-EECO-T-x (2 ) -EKRJ-x-ELIVMJ ( 2 ) -x— 

EPAG3-EDE3-L-X-EKR3-H-A- 

CONSENSUS: ELIVM3-ESTA3-E-G- 



NAME: Histone H3 signature 1- 
CONSENSUS: K-A-P-R-K-d-L- 

NAME : Histone H3 signature 2* 
30 CONSENSUS: P-F-x-ERA ID-L-EVA J-EKRdJ-OEGJ-CIVl - 

NAME: Histone H4 signature- 
CONSENSUS: G-A-K-R-H • 

35 NAME: HMG1/2 signature- 

CONSENSUS : EFI J-S-EKRIB-K-C-S-EEO-R-hl-K-T-M - 

NAME: HMG-I and HMG-Y DNA-binding domain (A+T-hook) 
CONSENSUS: EATJ-x ( 1 -.2 ) -ERO (2 ) -EGP3-R-G-R-P-ERO-X - 

NAME : HMG14 and HMG17 signature. 
CONSENSUS : R-R-S- A-R-L-S- A-ERO-P - 



NAME: Bromodomain signature- 

45 CON^E^US: ESTAN VF J-x (2 ) -F-x ( H ) -EDNSJ-x ( 7 ) -EDENtfTF J-Y- 

EHFY3-x(2)-ELIVMFY3-x(3)- 

CONSENSUS: ELIVMJ-x ( 4 ) -ELIVM3-X ( b -,fl ) -Y-x (12 13 ) -ELIVMJ-x ( 2 ) • 

N-ESACF3-x(2)-EFYll. 

50 NAME: Bromodomain profile- 

NAME: Chromo domain signature- 

CONSENSUS: EFYL3-x-ELIVMC3-EKR3-U-x-EGDNR])-EFYliJLE!D-x ( 5 it ) - 

EST3-U-EES3-EPSTDN31-x(3)- 
55 CONSENSUS: ELIVMCJ. 

NAME: Chromo and chromo shadow domain profile- 
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NAME: Regulator of chromosome condensation (RCC1) signature 
1- 

CONSENSUS: G-x-N-D-x ( 2 ) -EAVJ-L-G-R-x-T ■ 

5 NAME: Regulator of chromosome condensation (RCC1) signature 

a. 

CONSENSUS: ELIVMFA3-ESTAGC3 (2)-G-x(2) -H-EST AGLO-ELIVMFAS-x- 

ELIVMS- 

10 NAME: Protamine PI signature- 

CONSENSUS : EA V3-R-ENFY3-R-X (H-.3) -EST3-X-S-X-S - 



15 



NAME: Nuclear transition protein 1 signature. 
CONSENSUS : S-K-R-K- Y-R-K - 

NAME: Nuclear transition protein 2 signature !■ 
CONSENSUS: H-x<3)-H-S-ENS3-S-x-P-l2-S. 



NAME: Nuclear transition protein 2 signature S« 

20 CONSENSUS: K-x-R-K-x(2) -E-G-K-x (2) -K-EKRJ-K - 

NAME: Ribosomal protein LI signature- 

CONSENSUS: EIM3-X C2>-ELIVA3-x <2-.3 ) -ELIVM3-G-X C2 ) -ELMS3- 

EGSNH3-EPTKR3-EKRAV3-G-X- 
25 CONSENSUS: ELMF3-P-EDENSTK3 - 

NAME: Ribosomal protein L2 signature- 

CONSENSUS: P-x<2)-R-G-ESTAIV3<2)-x-N-EAPK3-x-E]>E3. 

30 NAME: Ribosomal protein L3 signature- 

CONSENSUS: EFL3-X (b > -EDN3-X ( 2) -EAGS3-x-EST3-x-G-EKRH3-G-x < 2> - 

G-x(3)-R- 

NAME: Ribosomal protein LS signature- 
35 CONSENSUS: ELIVM3-x(2>-ELIVM3-ESTAC3-EGE3-E(2V3-x(2)-ELIVMA3- 
X-ESTC3-X-ESTAG3-EKR3- 
CONSENSUS: X-ESTA3 • 

NAME: Ribosomal protein Lb signature 1- 
40 CONSENSUS: EPS3-EDENS3-X-Y-K-EGA3-K-G-ELIVM3 - 

NAME: Ribosomal protean Lb signature 2- 

CONSENSUS: <J-x ( 3 > -ELIVM3-X (2 ) -EKR3-X (2 ) -R-x-F-x-D-G-ELIVM3-Y- 

ELIVM3-x(2)-EKR3- 

45 

NAME: Ribosomal protein LT signature- 

CONSENSUS: G-x ( 2 ) -EGN3-X ( H ) -V-x (2 ) -G-EFY3-X (2 ) -N-EFY3-L-X CS) - 

EGA3-x(3)-ESTN3- 

50 NAME: Ribosomal protein LID signature- 

CONSENSUS: EDEH3-X (2) -EGS3-ELIVMF3-ESTN3-.EVA3-x-EDE<20- 

ELIVMA3-x(2)-ELIM3-R- 

NAME: Ribosomal protein Lll signature- 

55 CONSENSUS: ERKN3-x-ELIVM3-x-G-EST3-x ( 2 ) -ESN03-ELIVM3-G-X (2)- 

ELIVM3-X (□-.!) -EDENG3- 

NAME: Ribosomal protein L13 signature. 
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CONSENSUS: ELIVM3-EKRV3-EGK3-M-ELIV3-IPS3-X CM i S ) -EGS3- 

EN<3EKRA3-x(S)-ELIVM3-x-EAIV3- 

CONSENSUS: ELFY3-X-EGDN3 • 

5 NAME : Ribosomal protein L1M signature. 

CONSENSUS: EGA3-ELIV3(3>-x( e 5-.lD)-EI>NS3-G-x(i|>-EFY3-x(2>-ENT3- 
x(2>-V-ELIV3- 

NA11E: Ribosomal protein LIS signature- 

10 CONSENSUS: K-ELIVM3 ( 5 ) -EGAL3-x-EGT3-x-ELIVMA3-x ( 2 -. S) -ELIVM3- 

x-ELIVMF3-x (3-iM ) - 

CONSENSUS: ELIVMFC3-EST3-X (2 ) -A-x ( 3) -ELIVM3-X ( 3 > -G . 

NAME: Ribosomal protein Lib signature 1- 

15 CONSENSUS: EKR3-R-x-EGSAC3-EKflVA3-ELIVM3-bl-ELIVM3-EKR3- 
ELIVM3-ELFY3-EAP3 • 



20 



NAME: Ribosomal protein Lib signature 2. 

CONSENSUS : R-M-G-x-EGR3-K-G-x < 1) -EFWKR3 • 

NAME: Ribosomal protein L17 signature. 

CONSENSUS: I-x-EST3-EGT3-x(2>-EKR3-x-K-xCb)-EI>E3-x-ELIMV3- 
ELIVMT3-T-X-ESTAG3-EKR3 . 

25 NAME: Ribosomal protein LIT signature. 

CONSENSUS : ERT3-EKRSVY3-E6SA3-X-V-IRS3-EKR3-ESA3-K-L-Y-Y-L-R • 

NAME: Ribosomal protein L2D signature. 

CONSENSUS: K-x (3>-EKRC3rx-ELIVM3-UJ-EIVJ-ESTNALVJ-R-ELIVM3-N- 

30 x(3)-ERKH3- 

NAME: Ribosomal protein L21 signature- 

CONSENSUS: EIVT3-X ( 3 ) -EKR3-X ( 3 ) -IKRlO-K-x ( b ) -G-EHF3-R-ER<23- 

x(2)-T. 

35 

'NAME: Ribosomal protein L22 signature- 

CONSENSUS : ERK(2N3-x (M ) - ERH3-EGAS3-x-G-EKRt2S3-x ) -EHDN3- 

elivm3-x-elivms3-x-elivm3- 

40 NAME: Ribosomal protein L23 signature- 

CONSENSUS : ERK3 (2 ) -EAM3-EIVFYT3-EIV3-ERKT3-L-ESTAN«2K3-x ( 7 ) - 

ELIVMFT3- 

NAME : Ribosomal protein L2H signature. 
45 CONSENSUS: EGDEN3-D-X-V-X-EIV3-ELIVMA3-X-G-X < 2 ) -EKA3-EGN3- 

x(2,3)-EGA3-x-EIV3. 

NAME: Ribosomal protein L27 signature- 
CONSENSUS : G-X-ELIVM3 C2 > -x-R-fl-R-G-x < 5 ) -G - 

NAME : Ribosomal protein L2T signature- 

CONSENSUS: EKN<2S3-EPSTL3-x(2)-ELIMFAJ-EKRGSAN3-x-ELIVYSTA3- 
EKR3-EKRH3-EDESTANRL3- 

CONSENSUS: ELIV3-A-EKRCUVT3-ELIVMA3 - 

NAME : Ribosomal protein L3D signature- 

CONSENSUS: EIVT3-ELIVM3-X C2)-ELF3-x-ELI3-x-EKRH(2EG3-x(2) - 

CSTN(2H3-x-EIVT3- 
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CONSENSUS: x ( 10 ) -ELMSJ-ELIVJ-x (5 ) -ELIVAD-x ( S ) -ELMFYD-EIVTD • 

NAME: Ribosomal protein L31 signature. 

CONSENSUS : H-P-F-CFY3-ltTI3-x( E J)-G-R-n;AV3-x-CKR3. 

5 

NAME: Ribosomal protein L33 signature- 

CONSENSUS: Y-x-CST3-x-CKR3-IENS3-x ( ^ -EPATl-x ( 1 ->2 ) -ELIVMI- 

IHEAI-x C 2 ) -K-EFYID-IECSDID • 

10 NAME: Ribosomal protein L3*i signature- 

CONSENSUS: K-CRGI-T-EFYULD-IEEflSJ-x ( 5 ) -CKRHSI-x ( M -i5) -G-F-x (2 ) - 

R. 

NAME: Ribosomal protein L3S signature- 

15 CONSENSUS: CLIVnJ-K-CTVJ-x (H) -CGSA3-ESAIL3-X-K-R-ELIVI1FY3- 

EKRLJ • 

NAME: Ribosomal protein L3L signature. 

CONSENSUS : C-x ( 2 ) -C-x <2 ) -ELIVM3-x-R-x ( 3) -ELIVMNJ-x-ICLIVnni-x- 

20 C-x(3-.M)-EKR3-H-x-C-x-fl. 

NAME: Ribosomal protein Lie signature- 

CONSENSUS : N-x ( 3) -EKRl-x (S > -A-ELIVTJ-x-S-A-CLIVJ-x-A-ESTJ- 

ESGA3-x(7)-ERKJ-G-H. 
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NAME: Ribosomal protein Lbe signature. 

CONSENSUS : N-x ( 2 ) -P-L-R-R-x ( -EFYH-V-I- A-T-S-x-K . 



NAME: Ribosomal protein L7Ae signature- 

30 CONSENSUS: ECADl-x ( M) -EIVU-P-EFYID-x (2 ) -ELIVt13-x-EGS(33-EKR(23- 

x(2)-L-G- 

NAME: Ribosomal protein LIDe signature- 

CONSENSUS : R-x-A-EFYlO-G-K-EPAH-x-G-x (2) -A-R-V - 



NAME: Ribosomal protein L13e signature- 

CONSENSUS : EKR3-Y-X (5 ) -K-ELIVMll-R-ICSTAll-G-IEKRJ-G-F-IESTJ-L-x- 

E- 



40 NAME: Ribosomal protein LISe signature- 

CONSENSUS: EDE3-EKR3-A-R-x-L-G-EFY3-x-ESAP3-x(S)-G- 
ELIVMFY3(M)-R-x-R-V-x-R-G. 

NAME: Ribosomal protein Llfie signature- 

45 CONSENSUS: EKREJ-x-L-x ( 2 ) -EPS3-EKR3-X < 2 ) -ERH3-EPSA3-X-ELIVI13- 

ENSJ-ffLIVMJ-x-IHRO- 
CONSENSUS: ELIVM3 • 

NAME: Ribosomal protein LlTe signature- 

50 CONSENSUS: R-x-EKRJ-x ( S) -EKRJ-x (3) -EKRHI-x (S) -G-x-G-x-R-x-G- 

x(3)-A-R-x(3)-EK<23- 

CONSENSUS: x < 2 > -U-x C7 > -R-x < 2) -L-x <3> -R - 

NAME: Ribosomal protein L21e signature- 

55 CONSENSUS: G-EDED-x-V-x ( 10 ) -EGVJ-x (2 ) -EFYHH-x ( 2 ) -EFY3-x-G-x- 

T-G- 

NAME: Ribosomal protein L2Me signature. 
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CONSENSUS: IFY3-X-EGS3-X ( 2 ) -EIV3-x-P-G-x-G-x ( E ) -EFYV3-X- 

CKRHE3-X-D. 

NAME: Ribosomal protein LE7e signature. 

CONSENSUS: G-K-N-x-liJ-F-F-x-K-L-R-F> . 

NAME: Ribosomal protein L3De signature 1- 

CONSENSUS: ESTAD-x ( 5) -G-x-E<JKR3-x (E ) -ELIVM3-EK(3T3-x ( E ) -EKR3- 

x-G-x(E)-K-x-ELIVri3(3> • 

NAME: Ribosomal protein L30e signature E- 

CONSENSUS: OE3-L-G-ESTA3-X (E ) -G-EKR3-X ( b ) -ELIVPD-x-ELIVrO-x 

EDENJ-x-G- 

15 NAME: Ribosomal protein L31e signature- 

CONSENSUS: V-EKR3-ELIVI13-X (3 > -ELIVI13-N-x-EAO-x-U-x-EKR3-G - 

NAME: Ribosomal protein L3Ee signature. 

CONSENSUS: F-x-R-x ( W ) -EKR3-X C E> -EKR3-ELIVM J-x ( 3 > -td-R-EKR3- 

20 x(E)-G. 

-NAME: Ribosomal protein L3Me signature- 
CONSENSUS: Y-x-EST3-x-S-ENY3-x C 5) -EKR3-T-P-G - 

25 NAME: Ribosomal protein L35Ae signature- 

CONSENSUS: G-K-ELIVM J-x-R-x-H-G-x CE ) -G-x-V-x-A-x-F-x ( 3 ) -ELI3 

P- 

NAME: Ribosomal protein L3te signature- 
30 CONSENSUS: P- Y-E-EKR3-R-x-ELIVM3-E]>E3-ELIVn3 (E ) -EKR3 • 

NAME: Ribosomal protein L37e signature- 

CONSENSUS: G-T-x-ESA3-x-G-x-CKR3-x ( 3 ) -EST3-X (0-.1) -H-x < E ) -C-x 

R-C-G- 

NAME: Ribosomal protein L31e signature. 

CONSENSUS: EKRA3-T-X <3>-ELIVM3-EKRi2F3-x-ENHS3-x <3>-R-ENHY3-U 

R-R . 

40 NAME: Ribosomal protein LM4e signature. 

CONSENSUS: K-x-ETV3-K-K-x (2 ) -L-IKR3-X ( £) -C - 

NAME: Ribosomal protein SE signature 1> 

CONSENSUS: ELIVHFA3-x<2)-ELIVMFYC3(2>-x-ESTAC3-EGSTAN(2EKR3- 
45 ESTALV3-EHY3-ELIVMF3-G - 

NAME: Ribosomal protein SE signature 2- 

C0NSENSUS: P-x (S ) -ELIVI1F3 ( E) -ELIVI1S3-X-EGDN3-X ( 3) -EDENL3- 

x(3>-ELIVn3-x-E-x(M)- 
50 CONSENSUS: EGN<2KRH3-CLIVn3-EAP3 . 

NAME: Ribosomal protein S3 signature- 

CONSENSUS: EGSTA3-EKR3-x(b) -G-x-ELIVHTH-x (2)-EN<2SCH3-x(l-i3>- 

ELIVFCA3-x(3)-ELIV3- 
55 CONSENSUS: EDEN(23-x < 7 ) -ELI1T3-X (2) -G-x < E> -G - 

NAME: Ribosomal protein SM signature- 
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CONSENSUS : ELI VfO-EDEID-x-R-L-x ( 3 ) -ELIVNCJ-EVriF YHlO-EKRT])- 

x(3)-ESTAGCF]]-x-EST:B-x(3)- 

CONSENSUS: ESAIJ-EKRIB-x-ELIVHFlI < 2) - 

NAME: Ribosomal protein SS signature- 

CONSENSUS: G-EKRfll-x <3)-EFY3-x-EACVID-x(2)-ELIVriAni-ELIVrO- 

EAGID-EI>N:D-x(2)-G-x- 

CONSENSUS: ELIVn3-G-x-ESAG3-x(5-ib)-El>Efl3-'ELIVri3-x(2)-A- 
ELIVMF3- 

NAME : Ribosomal protein Sb signature- 

CONSENSUS: G-x-EKRC J-EDEN(3RHID-L-ESA 3-Y-x-I-EKRNSA 3 - 



NAME : Ribosomal protein S7 signature. 

15 CONSENSUS: EDENSK3-X-ELIVI1ET3-X (3) -ELIVMFT3 (2) -x Cb ) -G-K-EKR3- 

x(5)-ELIVnF3-ELIVriFC3- 
CONSENSUS: x (2 ) -ESTA3 - 

NADE : Ribosomal protein Sfl signature- 

20 CONSENSUS: EGE3-X ( 2 ) -ELIV3 ( 2 ) -ESTY3-T-X (2 ) -G-ELIVH3 ( 2 ) -x ( M ) - 

EAG3-EKRHAYI3- 

NAME: Ribosomal protein ST signature- 

CONSENSUS: G-G-G-x ( 2 ) -EGSAID-fl-x ( 2 ) -ESA3-X (3 ) -EGSA3-X-EGSTAV 3- 

25 EKR3-EGSAL3-ELIF3- 

NAflE : Ribosomal protein SID signature- 

CONSENSUS: EAV3-x(3)-EGDNSR3-ELIVriSTA3-x(3)-G-P-ELIVM3-x- 
ELIVI13-P-T. 

30 

NAME : Ribosomal protein Sll signature- 

CONSENSUS: ELI VMF3-X-EGST AC J-ELIVMF J-x ( 2 ) -EGST AL3-X ( □ n 1) - 

EGSN3-ELIVI1F3-x-ELIVri3- 

CONSENSUS: x ( H ) -EDEN3-X-T-P-X-EPA3-ESTCH3-EDNID - 

NAME: Ribosomal protein S12 signature. 

CONSENSUS: ERK3-X-P-N-S-EAR J-x-R - 

NAME : Ribosomal protein S13 signature- 

40 CONSENSUS: EKR<3S3-G-x-R-H-x ( 2 ) -EGSNH3-X ( 2 ) -ELIVMC3-R-G-<2 - 

NAME: Ribosomal protein SIM signature. 

CONSENSUS: ERP3-X ( Oil) -C-x (11-. 12) -ELIVMF:D-x-ELIVriF3-ESC3- 

ERG3-x(3)-ERN3- 

45 

NAME: Ribosomal protein SIS signature- 

CONSENSUS: ELIVI13-X (2 ) -H-ELIVI1FY3-X < 5 ) -D-x (2 ) -ESAGN3-X ( 3 ) - 

ELF3-xCT>-ELIVM3-x(2>- 

C0NSENSUS: EFY3- 

NAME : Ribosomal protein Sib signature- 

CONSENSUS: ELIVHT3-X-ELIVM 3-EKR3-L-EST AK J-R-X-G-EAKR3 - 

NAME: Ribosomal protein S17 signature- 

55 CONSENSUS: G-D-x-ELIV3-x-ELIVA3-x-E(2EK3-x-ERK 3-P-ELIV3-S . 



NAME: Ribosomal protein Slfi signature 
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CONSENSUS: EIV D-EDYl-Y-x ( E ) -ELIVMTD-x ( E ) -ELIVMU-x ( E ) -EFYTID 

ELIVMID-ESTID-EDERPID-x- 

CONSENSUS: EGY3-K-ELIVM:i-x ( 3 ) -R-ELIVMAS3 - 

5 NAME: Ribosomal protein Sn signature- 

CONSENSUS : ESTDNd J-G-EKRdMUI-x ( b ) -ELIVMD-x ( M ) -ELIVM3-EGSD 3- 

x(S)-CLF]i-a:GAS3-a:PEJ-F- 

CONSENSUS: x(E)-CSTJ. 

10 NAME: Ribosomal protein SHI signature - 

CONSENSUS: EDE J-x-A-ELY J-EKR3-R-F-K-EKRn]-x ( 3 > -EKR3 - 
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NAME: Ribosomal protein S3Ae signature- 
CONSENSUS: ELIVl-x-EGH 3-R-EIV J-x-E-x-ESO-L-x-D-L - 

NAUE: Ribosomal protein SMe signature. 

CONSENSUS: H-x-K-R-ELIVMIB-ESAN J-x-P-x ( E ) -U-x-ELIVM J-x-EKRl 



NAME: Ribosomal protein Sbe signature- 
20 CONSENSUS: ELIVMIB-ESTAMRHI-G-G-x-D-x ( E ) -G-x-P-M - 

NAME: Ribosomal protein S7e signature- 
CONSENSUS: EKRJ-L-x-R-E-L-E-K-K-F-ESAP J-x-EKRH-H • 

25 NAME: Ribosomal protein Sfie signature- 

CONKEN^US: R-x (E ) -T-G-EGAIB-x ( 5) -EHR3-K-IEKR J-x-K-x-E-ELrO-G 



NAME: Ribosomal protein SlEe signature- 

CONSENSUS: A-L-EKRflPJ-x- V-L-x ( E ) -ESAID-x ( 3 ) -EDN3-G-L . 

NAME: Ribosomal protein S17e signature- 

CONSENSUS : A-x-I-x-EST J-K-x-L-R-N-EKR J-I- A -G-EF YJ-x-T-H . 



NAME: Ribosomal protein Sl^e signature- 

35 CONSENSUS: P-x ( b > -ESAN3-X < S ) -ELI VMA3-x-R-x-EALIV3-EL VJ-<2-x 

EEO- 



NAUE: Ribosomal protein SEle signature- 

CONSENSUS: L-Y-V-P-R-K-C-S-ESA3- 

NAME : Ribosomal protein SEMe signature. 

CONSENSUS: EFAJ-G-x (E ) -EKRJ-ESTAH-x-G-EFYID-EGAID-x-ELIVMIII-Y 

EDN3-ESN3 - 

45 NAME: Ribosomal protein SEbe signature- 
CONSENSUS: EYH3-C-V-S-C- A-I-H - 

NAME: Ribosomal protein SE7e signature- 

CONSENSUS: EAO-C-x ( E ) -C-x (b ) -F-EGS3-x-EPSA3-x ( 5 ) -C-x ( E ) -C 

50 EGSn)-x(E)-L-xCE>-P-x-G. 

NAME: Ribosomal protein SEfle signature- 

CONSENSUS: E-ESTJ-E-R-E- A-R-x-L - 

55 NAME : DNA mismatch repair proteins mutL / hexB / PMS1 

signature. 

CONSENSUS: • G-F-R-G-E-A-L - 
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NAME: DNA mismatch repair proteins mutS family signature- 

CONSENSUS: ESTiD-ELIVMID-x-ELIVHID-x-D-E-ELI VMYJ-EGO-ERKHJ-G- 

EGSTJ-x<iO-G. 

5 NAME: mutT domain signature- 

CONSENSUS : G-x ( 5) -E-x ( M ) -ESTAGC 1-ELIVM ACJ-x-R-E-ELIVMFTJ-x-E- 

E • 

NAME: DnaA protein signature. 

10 CONSENSUS : I-EGAH3-X ( 2 ) -ELIVMFID-ESGDNO-x ( 0 1) -EKRD-x-H-ESTPJ- 

CSTVni-D:LIVf13(2)-x- 

CONSENSUS: ESA 1-x ( 2 ) -EKREl-ELIVMHi - 

NAME : Small-i acid-soluble spore proteins-i alpha/beta typen 

15 signature 1. 

CONSENSUS : K-x-E-ELIV J-A-x-EDEHl-ELIVMFl-G-ELIVMFJ - 

NAME: Small-i acid-soluble spore proteins-i alpha/beta typen 

signature 2- 

20 COU^E^USi EKRJ-ESA(3J-x-G-x-V-G-C-x-ELIVMJ-x-EKR3(2>- 
ELIVMJ(2) - 



25 



NAME: Zinc-containing alcohol dehydrogenases signature- 

CONSENSUS: G-H-E-x (2 ) -G-x ( 5 ) -EGAH-x (2 ) -EIVSAC3 - 

NAME: fluinone oxidoreductase / zeta-crystallin signature- 
CONSENSUS : EGSD J-EDEflHJ-x ( 2 ) -L-x ( 3 ) -ESA 1 ( 2 ) -G-G-x-G-x ( M ) -12- 

x(2)-EKR3- 

30 NAME: Iron-containing alcohol dehydrogenases signature 1- 
CONSENSUS: CSTALIV3-II!LIVFll-x-EI>E3-x(tjT7)-P-x( l 4)-I[;ALIV3-x- 
CGSTJ-x(2)-l)-CTAIVn3- 
CONSENSUS: ELIVMF3-X (H ) -E • 

35 NAME: Iron-containing alcohol dehydrogenases signature S. 

CONSENSUS: EGSUl-x-ELIVTSACDJ-IGHJ-x (2 ) -EGSAEI-IEGSHYtO-x- 

CLIVTP3-EGAST3-EGAS3-x(3>- 

CONSENSUS: ELIVMTiB-x-EHNSll-EGAID-x-IEGTAO • 

40 NAI1E: Short-chain dehydrogenases/reductases family 

signature. 

CONSENSUS : ELIVSPADNO-x ( IS ) -Y-EPSTAGNCVU-ESTAGNdKIVMH- 

ESTAGO-K-CPO- ESAGFR3- 

C0NSENSUS: ELIVMSTAGDD-x ( S ) -ELIVMFYtO-x (3 ) -ELIVnTYUGAPTHO- 

45 EGSAC(JRHM3 - 

NAME: Aldo/keto reductase family signature 1- 

CONSENSUS : G-EFY3-R-EHSAL3-ELIVnF3-D-ESTAGC3-CAS3-x(S)-E- 

x(B)-ELIVnH-G- 

50 

NAME: Aldo/keto reductase family signature S. 

CONSENSUS : ELIVMFYJ-x ( 1) -EKREfO-x-ELIVMli-G-ELIVMll-ESO-N- 

EFYJ - 

55 NAME: Aldo/keto reductase family putative active site 

signature* 

CONSENSUS: ELIVM3-EPAIV3-EKR3-EST3-X (-» ) -R-x (B) -EGSTAElJO- 

ENSL3-x(2)-ELIVI1FA3. 
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NAME: Homoserine dehydrogenase signature- 

CONSENSUS: A-x ( 3) -G-ELIVMFYHI-ESTAGJ-x ( 2 -.3) -EDNSID-P-x (E ) -D- 

ELIVM3-x-G-x-D-x<3)-K. 

5 

NAME : NAD- dependent glycerol-3-phosphate dehydrogenase 

signature. 

CONSENSUS: G-E AT3-ELI VMHJ-K-ONID-ELIVM:]) C£ ) -A-x-EGAl-x-G- 

ELIVMFl-x-EDEH-G-ELIVMID-x- 
10 CONSENSUS : ELIVMFYtO-G-x-N - 

NAME : FAD-dependent glycerol-3-phosphate dehydrogenase 

signature 1 - 

CONSENSUS : EI V 3-G-G-G-x ( B ) -G-ESTACV J-G-x-A-x-D-x (3 ) -R-G - 
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NAME: FAD- dependent glycerol -3-phosphate dehydrogenase 

signature E- 

CONSENSUS: G-G-K-x (E) -EGSTEID-Y-R-x (E) -A - 



20 NAME : Mannitol dehydrogenases signature- 

CONSENSUS : ELIVMY ID-x-EFSl-x ( E ) -ESTAGCVD-x- V-D-R-EIVID-x-EPSHI - 

NAME: Histidinol dehydrogenase signature- 

CONSENSUS : I-D-x (E ) -A-G-P-EST3-E-ELIVS 3-ELIVMAID ( 3) -EAC J-x ( 3 ) « 

25 A-x(M)-ELIVMJ-EAVJ- 

CONSENSUS: ESACLJ-EDE3-ELIVMFC3-ELIVM3-ESA J-x ( E ) -E-H - 



NAME: L-lactate dehydrogenase active site- 

CONSENSUS : ELIVMAJ-G-EEl23-H-G-EDN J-EST 1 . 



NAME : D-isoroer specific E-hydroxyacid dehydrogenases NAD- 

binding signature- 

CONSENSUS: ELIVMA3-EAG3-EIVTJ-ELIVMFY:B-EAGJ-x-G-ENHKRt3GS ACID- 

IC LI V3-G-X C 13 -.1M > - 
35 CONSENSUS: ELIVf MT3-X ( E ) -EFYwCTH3-EDNSTK3 . 

NAME : D-isomer specific E-hydroxyacid dehydrogenases 

signature E- 

CONSENSUS: ELIVMFYUA3-ELIVFYUC3-X (E) -ESAC J-EDNdHR J-EIVFA3- 

40 ELIVF3-X-ELIVF3-IHNI3-X- 

CONSENSUS: P-x ( M ) -ESTN3-X ( E ) -ELIVMF3-X-EGSDN3 - 

NAME: D-isomer specific E-hydroxyacid dehydrogenases 

signature 3- 

45 CONSENSUS: ELMFATC J-EKP<23-x-EGSTDN3-x-ELIVMFYli)R3- 

ELIVMFYU3C2)-N-x-ESTAGC3-R-EGPJ-x- 
CONSENSUS: ELIVH3-ELIVMC3-EDNV3 - 

NAME: 3-hydroxyisobutyrate dehydrogenase signature. 

50 CONSENSUS: ELIVMFY3 ( S ) -G-L-G-x-EM<2 3-G-X-EPGS3-EMA3-ESAJ . 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 

1. 

CONSENSUS: ERKH3-X (b ) -D-x-M-G-x-N-x-ELIVMA 1 - 



NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 

E- 

CONSENSUS : ELI VM3-G-X-ELIVM3-G-G-EAG3-T - 
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NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 
3- 

CONSENSUS: A-ELIVfO-x-ISTANJ-x ( 2 ) -IL.I3-x-EKRN<33-EGSA3-H-ELt13- 
5 X-EFYLH3 • 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile- 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature- 

10 CONSENSUS : IDNE3-X ( 2 ) -IGA3-F-ELIVI1FY3-x-ENT3-R-x ( 3 ) -CPA3- 
ELIVMFY3(2)-x(S)- 

CONSENSUS: ELIVUFYCTS-ILIVflFYS-x < 2 > -EGV3 • 

NAME: Malate dehydrogenase active site signature. 

15 CONSENSUS: ELIVM3-T-ETRKI1N3-L-]>-x (2 ) -R-ESTA3-X (3 ) -ELIVI1FY3 - 

NAME: Plalic enzymes signature- 

CONSENSUS: F-X-EDV3-D-X ( 2 ) -G-T-EGSA3-X-EIV3-X-ELIVMA3- 
EGAST3C2>-ELIVI1F3<2> - 

20 

NAME: Isocitrate and isopropylmalate dehydrogenases 
signature- 

CONSENSUS : ENS3-ELIMYT3-EFYI>N3-G-E]>NT3-EIh'VY3-x-ESTGI>N3-EDN3- 
x(2)-ESGAP3-x(3-.M)-G- 

25 CONSENSUS: ESTG3-ELIVMPA3-G-ELIVMF3 - 

NAME: b-phosphogluconate dehydrogenase signature- 

CONSENSUS: ELIVfO-x-D-x ( 2 ) -EGA3-ENflS3-K-G-T-G-x-.U - 

30 NAME: Glucose-b-phosphate dehydrogenase active site- 

CONSENSUS : D-H-Y-L-G-K-EEAK3 - 

NAME: IMP dehydrogenase / GUP reductase signature- 

CONSENSUS: ELIVM3-ERK3-ELIVM3-G-ELIVM3-G-X-G-S-ELIVI13-C-X-T • 

35 . ...... 

NAME: Bacterial quinoprotein dehydrogenases signature 1- 

CONSENSUS: EDEN3-U-X < 3 ) -G-ERK3-X ( b) -EFYU3-S-X ( 4 ) -ELIVM3-N- 
xC2)-N-V-x(2)-L-ERK3. 

40 NAI1E: Bacterial quinoprotein dehydrogenases signature 2- 

CONSENSUS: W-x ( M ) -Y-D-x ( 3 ) -EDN3-ELIVMFY3 ( 4 ) -x ( 2 ) -6-x (2) - 
ESTA3-P. 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active 
45 site- 

CONSENSUS: S-N-H-G-E AG3-R-C3 . 

NAME: GPIC oxidoreductases signature 1- 

C0NSENSUS: EGA3-ERKN3-X-ELIV3-G (2 ) -EGST3 (2 ) -X-ELIVM3-N-X (3 ) - 
50 EFYUA3-x(2)-EPAG3-x(S)- 

CONSENSUS: EDNESH3 • 

NAME: GMC oxidoreductases signature 2- 

CONSENSUS: EGS3-EPSTA3-X ( 2) -EST3-P-X-ELIVM3 (2) -x (2) -S-G- 
55 ELIVM3-G- 

NAME: Eukaryotic molybdopterin oxidoreductases signature. 
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CONSENSUS ' EGA3-x< 3) -EKRN(2HT3-x (11,110 -ELIVhTYliJSS-x Cfl) - 

ELIVMF3-x-C-x(£)-El>EN3-R- 

CONSENSUS: x(E>-EI>E3. 

5 N AflE : Prokaryotic molybdopterin oxidoreductases signature 1. 

CONSENSUS: ESTAN3-x-[i:CH3-x ( E -. 3) -C-ESTAG3-EGSTVMF3-X-C-X- 

CLIVMFYU J-X-ELIVMA3-X ( 3-. 4 > - 
CONSENSUS: EDENCKHT3 • 

10 NAME: Prokaryotic molybdopterin oxidoreductases signature E- 

CONSENSUS : ESTA3-X-ESTAC3 (E>-x<E)-ESTA3-D-ELIVMY3 (E)-L-P-x- 

ESTAC3(E)-x(E>-E- 

NAME: Prokaryotic molybdopterin oxidoreductases signature 3. 

15 CONSENSUS: A-x (3 ) -EGDT3-I-x-E]>N<2TK3-x-E]>EA3-x-ELIVI13-x- 

ILIVMC3-x-ENS3-x(E)-EGS3- 
CONSENSUS: x(5)-A-x-ELIVI13-EST3. 

NAME: Aldehyde dehydrogenases glutamic acid active site- 

20 CONSENSUS: CLIVhTGA3-E-ELinSTAC3-EGS3-G-EKNLI13-CSAI>N3- 
ITAPFV3 • 

NAME: Aldehyde dehydrogenases cysteine active site- 

CONSENSUS: EFYLVA3-X ( 3 ) -G-E(2E3-x-C-ELIVMGSTANC3-IEAGCN3-x- 

25 EGSTADNEKR3 • 

NAME: Aspartate-semialdehyde dehydrogenase signature- 

CONSENSUS: CLIVM3-ESADN3-X ( E ) -C-X-R-ELIVM3-X ( 4 ) -CGSC3-H- 

ESTA3 • 

NAME: Glyceraldehyde 3-phosphate dehydrogenase active site- 

CONSENSUS : EASV3-S-C-ENT3-T-X (E ) - ELH13 - 
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NAME: N-acetyl-gamma-glutamyl-phosphate reductase active 

35 site. . - 

CONSENSUS : ELIVM3-EGSA3-x-P-G-C-EFY3-EAVP3-T-EGA3^x (3 ) - 
CGTAC3-ILIVH3-X-P. 

NAI1E: Gamma-glutamyl phosphate reductase signature. 

40 CONSENSUS: V-x ( S) -A-ELIV3-x-H-I-x (E) -ITHY3-IGS3-IST3-X-H-EST3- 
EDE3-X-I. 

NAME: Dihydrodipicolinate reductase signature- 

CONSENSUS: E-EIV3-x-E-x-H-x (3 ) -K-x-D-x-P-S-G-T-A . 

45 

NAME: Dihydroorotate dehydrogenase signature 1- 

CONSENSUS: EGS3-x(4)-EGK3-ESTA3-EIVSTA3-EGT3-x<3)-EN<2R3-x-G- 

ENH3-x(E)-P-ERT3- 

50 NAME: Dihydroorotate dehydrogenase signature E- 

CONSENSUS: ELIV3 (E ) -EGSA3-x-G-G-EIV3-x-ESTGN3-x (3) -EACV3- 

x(b)-G-A. 

NAME: Coproporphyrinogen III oxidase signature- 

55 CONSENSUS: K-x-li)-C-x ( £ ) -EFYH3 (3) -ELIVM3-x-H-R-x-E-x-R-G- 

ELIVI13-G-G-ELIVri3-F-F-]>. 
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NAME: Fumarate reductase /succinate dehydrogenase FAD- 

binding site • 

CONSENSUS: R-EST3-H-EST J-x ( 2 ) -A-x-G-G - 

5 NAME: Acyl-CoA dehydrogenases signature 1. 

CONSENSUS: EGAC J-ELIVMJ-EST J-E-x ( 2 ) -EGSAN J-G-ESTJ-D-x < 2) - 

EGSA J - 

NAME : Acyl-CoA dehydrogenases signature 2. 

10 CONSENSUS: E<3DEJ-x ( 2 ) -G-EGS J-x-G-ELIVIIFY J-x ( 2 ) -EDEN J-x ( H ) - 

EKR3-x(3)-EDEN3- 

NAME: Alanine dehydrogenase & pyridine nucleotide 

transhydrogenase signature 1- 
15 CONSENSUS: G-ELIVM3-P-x-E-x(3)-N-E-x(1t3)-R-V-A-x-ESTJ-P-x- 
EGSTJ-V-x(2)-L-x-EKRHJ- 
CONSENSUS: x-G- 

NAME: Alanine dehydrogenase & pyridine nucleotide 

20 transhydrogenase signature 2- 

CONSENSUS: ELIVMJ ( 2 ) -G-EGA J-G-x-A-G-x ( 2 > -ESAJ-x (3)-EGAJ-x- 

ESGJ-ELIVJO-G-A-x-V- 

CONSENSUS: x(3)-D- 

25 NAHE : Glu / Leu / Phe / Val dehydrogenases active site- 

CONSENSUS: ELIVJ-x ( 2) -G-G-ESAGJ-K-x-EGV J-x ( 3) -EDNSTJ-EPLJ - 
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NAME: D-amino acid oxidases signature. 

CONSENSUS: ELIVIIJ ( 2 > -H-ENHA J-Y-G-x-EGSA J (2) -x-G-x<5) -G-x-A . 

NAME : Pyridoxamine 5 r -phosphate oxidase signature- 
CONSENSUS: * ELIVF J-E-F-U-EC3HG J-x < M ) -R-ELIVil J-H-EDNE J-R - 



NAME: Copper amine oxidase topaquinone signature- 

35 CONSENSUS: ELIVM3-ELIVI1 AJ-ELIVIIJ-x ( M ) -T-x ( 2 ) -N-Y-EDE J-EYN J . 

NAME: Copper amine oxidase copper-binding site signature. 

CONSENSUS : T-x-G-x ( 2 ) -H-ELI VIIFJ-x ( 3 ) -E-EDEJ-x-P - 

40 NAME: Lysyl oxidase putative copper-binding region 
signature. 

CONSENSUS: L)-E-li)-H-S-C-H-i3-H-Y-H - 

NAME: Delta l-pyrroline-5-carboxylate reductase signature- 
45 CONSENSUS: EPALF J-x ( 2 3 ) -ELIVJ-x (3 ) -ELIVI1I-EST ACJ-ESTV J-x- 

EGAN3-G-x-T-x<2)-EAGJ- 

CONSENSUS: ELIVJ-x < 2) -ELMF J-EDEN(2KJ . 

NAME: Dihydrof olate reductase signature. 

50 CONSENSUS: ELVAGC J-ELIF J-G-x ( M ) -ELIVMF J-P-U-x < H -, 5 ) -EDE J-x (3 ) - 

EFYIVJ-x(3)-ESTIt3J. 

NAME : Tetrahydr of olate dehydrogenase/cyclohydrolase 

signature 1 - 

55 CONSENSUS: EES J-x-EEflK J-ELIVN J ( 2 ) -x ( 2 ) -ELIVfl J-x ( 2 ) -ELIVI1Y J-N- 

x-EDNJ-x(5)-ELIVf1FJ(3)- 
CONSENSUS: tf-L-P-ELVJ- 
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NAflE : Tetrahydrof olate dehy drogenase /eye 1 o hydrolase 

signature 5- 

CONSENSUS: P-G-G- V-G-P-ILNFJ-T-iri V3 - 

5 NAflE: Oxygen oxidoreductases covalent FAD-binding site. 

CONSENSUS: P-x (ID ) -OEID-ELIVrO-x ( 3 ) -CLIVfO-x ( 1 ) -ELIVM3-X ( 3 ) - 

EGSA3-IEGST3-G-H- 

NAME: Pyridine nucleotide-disulphide oxidoreductases class-I 

10 active site- 

CONSENSUS: G-G-X-C-ELIVA3-X ( 2 ) -G-C-ELIVfO-P . 

NAME: Pyridine nucleotide-disulphide oxidoreductases class- 

II active site- 

15 CONSENSUS: C-x(2)-C-D-EGA3-x(2t4)-EFY3-x(4) -ELIVfO-x- 

ELivnj(S)-GC3)-Cl>Nll- 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 

signature 1- 

20 CONSENSUS: G-ELI VI1FYKRS3-ELIVMAGP3-(2-x-IELI VMF Y3-x-D-EAGiri3- 

!ELIVriFTA3-K-ELVMYST3- 

CONSENSUS: ELIVMFYG3-x-IEKR3-IEE<3G3 . 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 

25 signature 2. 

CONSENSUS: P-F-D-ELI VHFY<0-ESTA6PVIi:D-E-EGAC3-E-x-EE<0- 

ELIVHS3-x(2)-G. 

NAHE: Respiratory-chain NADH dehydrogenase 2D Kd subunit 

30 signature- 

CONSENSUS: EGN3-X-D-EKRST3-ELIVMF3 ( 2 ) -P-EIV3-D-ELI VMFYLO < 2 ) - 

x-P-x-C-P-CPTl. 

NAME: Respiratory-chain NADH dehydrogenase 24 Kd subunit 

35 signature. 

CONSENSUS: D-x ( 2 ) -F-EST3-X ( 5 ) -C-L-G-x-C-x ( 2 ) -EGA3-P - 

NAflE: Respiratory chain NADH dehydrogenase 3D Kd subunit 
signature. 

40 CONSENSUS: e-R-E-x (2) -EDE3-ELIVMF3 (2) -x (b ) -EHK3-x( 3) -EKRP3-X- 

ELIVrO-ELIVIISJ. 

NAME: Respiratory chain NADH dehydrogenase 4^ Kd subunit 
signature. 

45 CONSENSUS: ELIVMH3-H-ERT3-EGA3-x-E-K-ELIVnT3-x-E-x-EKR(0 . 

NAME: Respiratory-chain NADH dehydrogenase 51 Kd subunit 

signature 1- 

CONSENSUS: G-EAM3-G-EAR3-Y-ELIVI13-C-G-EDE3 C2) -ESTA3 (2) - 

50 ELII13C2)-EEN3-S. 

NAME : Respiratory-chain NADH dehydrogenase 51 Kd subunit 

signature 2- 

CONSENSUS: E-S-C-G-x-C-x-P-C-R-x-G . 



55 



NAJ1E : Respiratory-chain NADH dehydrogenase 75 Kd subunit 

signature 1- 

CONSENSUS: P-x ( 2 ) -C-E YUS3-X ( 7 ) -G-x-C-R-x-C - 

-415- 
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NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit 

signature S. 

CONSENSUS: C-P-x-C-EDEJ-x-EGSJ < 2 ) -x-C-x-L-<2 - 

5 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit 

signature 3. 

CONSENSUS: R-C-ELIVM J-x-C-x-R-C-ELIVMID-x-EFYJ . 

10 NAME : Nitrite and sulfite reductases iron-sulf ur/siroheme- 

binding site* 

CONSENSUS: ESTV J-G-C-x ( 3 ) -C-x ( b ) -OEJ-ELIVMFJ-EGATJ-ELIVMFll - 

NAME : Uricase signature. 

15 CONSENSUS : L-x-EL Vm-L-K-ESTH-T-x-S-x-F-x ( 2 ) -EFYD-x ( M ) -EFY3 . 

NAME: Heme-copper oxidase catalytic subunit-. copper B 

binding region signature- 

CONSENSUS: EYUGl-ELIVFYUTAIB ( 2 ) -EVGSJ-H-ELNPH-x-V-x ( 4M n M7 ) -H- 

20 H. 

NAME : CO II and nitrous oxide reductase dinuclear copper 

centers signature- 

CONSENSUS: V-x-H-x ( 33 MO ) -C-x ( 3 ) -C-x ( 3 ) -H-x ( 2 ) -11 - 

25 

NAME: Cytochrome c oxidase subunit Vb-> zinc binding region 

signature- 

CONSENSUS: ELIVMJ ( 2 ) -EFYUJ-x ( ID ) -C-x ( 2 ) -C-G-x ( 2 ) -EFYJ-K-L . 

30 NAME: Multicopper oxidases signature 1- 

CONSENSUS: G-x-EFYW J-x-ELIVMFYIiO-x-ECSTa-x ( A ) -G-ELMJ-x ( 3 ) - 

ELIVMFYIiO. 

NAME: Multicopper oxidases signature 2- 
35 CONSENSUS: H-C-H-x <3> -H-x (3 ) -EAGll-ELMID - 

NAME: Peroxidases proximal heme-ligand signature. 

CONSENSUS: EDETJ-ELIVMTAID-x ( 2 ) -ELIVMH-ELIVMSTAGID^ESAG IB- 

ELI VMSTAGJ- H-EST A J-ELIVMFYID - 



40 



NAME: Peroxidases active site signature- 

CONSENSUS: ESGATV3-X ( 3 ) -ELIVMAJ-R-ELIVMAJ-x-EFbO-H-x-ESAO ■ 



NAME: Catalase proximal heme-ligand signature- 

45 CONSENSUS: R-ELIVMFSTAN31-F-EGASTNP3-Y-x-D-ICAST3-Et2EHni - 

NAME: Catalase proximal active site signature- 

CONSENSUS: EIFJ-x-ERHJ-x (H ) -EEM-R-x (2) -H-x (2) -CGASJ-EGASTF J- 

EGASTU- 

50 

NAME:. Glutathione peroxidases selenocysteine active site. 
CONSENSUS: EGNH-ERKHNFYO-x-ELIVMFCID-ELI VMFJ ( 2 ) -x-N-EVTJ-x- 

ESTO-x-C-EGAI-x-T - 

55 NAME : Glutathione peroxidases signature 2- 

C0NSENSUS: CLIVl-EAGDU-F-P-ECSID-ENGJ-tf-F - 

NAME: Lipoxygenases iron-binding region signature 1. 

-416- 
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CONSENSUS: H-EEGiB-x ( 3 ) -H-x-ELMD-ENflRCJ-EGSTD-H-ELI VMST AO ( 3 ) - 

E- 

NAME: Lipoxygenases iron-binding region signature 2- 
CONSENSUS: ELIVMA J-H-P-ELIVM J-x-EKRtO-ELIVMFIll ( 2 ) -x-EAPl-H - 

NAME: Extradiol ring-cleavage dioxygenases signature- 

CONSENSUS: EGNTIVJ-x-H-x ( 5 ? ) -ELIVMFJ-Y-x ( E ) -EDENT A3-P-X- 

EGP3-x(S-.3)-E. 

NAME: Intradiol ring-cleavage dioxygenases signature- 

CONSENSUS: ELIVMl-x-G-x-ELIVM J-x (»4 ) -EGSJ-x (2 ) -ELIVMJ-x ( M ) - 

ELIVM3-ICDE3I-ELIVI1FY3- 
CONSENSUS: x (b >-G-x-EFY3 . 

NAME: Indoleamine 2t 3-dioxygenase signature 1- 

CONSENSUS: G-G-S-EAN:D-EGA:I1-(2-S-S-x(2) -(2- 



NAME: Indoleamine 2i 3-dioxygenase signature 2- 

20 CONSENSUS : EFYJ-L-EDfl J-EDE3-ELIVM J-x ( 2) -Y-M-x (3) -H-EKRJ • 

NAME: Bacterial ring hydroxylat ing dioxygenases alpha- 

subunit signature- 

CONSENSUS: C-x-H-R-EGAIQ-x ( A ) -G-N-x ( 5) -C-x-EFYJ-H - 

25 

NAME: Bacterial luciferase subunits signature- 

CONSENSUS: EGAI-ELIVMHl-P-ELIVn J-x-ELIVMFYH-x-U-x ( b ) -ERKJ- 

x<h)-Y-x(3)-EARJ. 

30 NAME : ubiH/C0<3b monooxygenase family signature- 

CONSENSUS: H-P-ELIVID-E AGJ-G-tf-G-x-N-x-G-x ( 2 ) -D - 

NAME : Biopterin-dependent aromatic amino acid hydroxylases 

signature- 

35 CONSENSUS: P-D-x (2 ) -H-EDEJ-ELI3-ELIVMF J-G-H-ELIVMO-P . 

NAME: Copper type Iln ascorbate-dependent monooxygenases 

signature 1- 

CONSENSUS: H-H-M-x ( 2 ) -F~x-C - 

40 

NAME: Copper type II-i ascorbate-dependent monooxygenases 

signature 2. 

CONSENSUS : H-x-F-x ( 4 ) -H-T-H-x ( 2 ) -G . 

45 NAME : Tyrosinase CuA-binding region signature- 

CONSENSUS: H-x ( M 5) -F-ELIVMFTPl-x-EFUJ-H-R-x ( 2 ) -ELMH-x ( 3) -E - 

NAME: Tyrosinase and hemocyanins CuB-binding region 

signature- 

50 CONSENSUS: D-P-x-F-ELIVnFYU3I-x(2)-H-x(3)-I>. 

NAME: Fatty acid desaturases family 1 signature- 

CONSENSUS: G-E-x-EFYJ-H-N-EFYI-H-H-x-F-P-x-D-Y - 

55 NAME: Fatty acid desaturases family 2 signature- 

CONSENSUS : ESTJ-ESA J-x ( 3 ) -EflRJ-ELO-x (5-.b ) -D-Y-x (2 ) - 

ELIVMFYIO-ELIVM3-EDEJ • 
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NAME: Cytochrome PHBU cysteine heme-iron ligand signature- 

CONSENSUS: . EFtO-ESGNHJ-x-EGDl-x-ERHPT J-x-C-ELI VMFAPU-EG ADD - 

NAME : Heme oxygenase signature- 

5 CONSENSUS: L-L-V-A-H-A-Y-T-R - 

NAME : Copper/Zinc superoxide dismutase signature 1- 

CONSENSUS: CGAH-CIFATni-H-IELIVFI-H-x (2 ) -IEGPDI-CSDGJ-x-IESTAGPII . 

10 NAME: Copper/Zinc superoxide dismutase signature 2- 

CONSENSUS : G-EGN3-ESGA3-G-x-R-x-ESGA3-C-x ( 2 ) -CI VI • 



15 



20 



25 



30 



35 



NAME: Manganese and iron superoxide dismutases signature- 

CONSENSUS : D-X-U-E-H-ESTA3-EFY3 (2) - 

NAME: Ribonucleotide reductase large subunit signature* 

CONSENSUS : b}-x(2)-ELF3-x(ti-i7) -G-ELIVM3-EFYRA31-ENH3-X (3) - 

ESTA(3LIVrD-0:ASC:D-x<2)- 
CONSENSUS: EPA3 • 

NAME: Ribonucleotide reductase small subunit signature- 

CONSENSUS: EIVMSEflH-E-x CL-,2 ) -ELIVTAl-EHYIQ-EGSAID-x-ESTAVMIII-Y- 

x(2)-ELIVniO-x<3>- 

CONSENSUS: ELIFY2-EIVF YCSA3 - 

NAME: Nitrogenases component 1 alpha and beta subunits 

signature 1. 

CONSENSUS: ELIVMFYHJ-ELIVMFSTJ-H-EAGJ-EAGSP J-ELIVMNtfAID-EAGID- 

O 

NAME : Nitrogenases component 1 alpha and beta subunits 

signature 2- 

CONSENSUS: ESTANdO-EET J-C-x ( 5 ) -G-D-EDNJ-ELIVIITDl-x-ESTAGRID- 

ELIVMFYSTJ. 

NAME: NifH/frxC family signature 1. 

CONSENSUS: E-x-G-G-P-x ( 2 ) -EGAJ-x-G-C-EAGJ-G - 



NAME : NifH/frxC family signature 2- 

40 CONSENSUS: D-x-L-G-D-V-V-C-G-G-F-EAGl-x-P- 

NAME: Nickel-dependent hydrogenases large subunit signature 
1 - 

CONSENSUS: R-G-ELIVMF3-E-X (IS ) -ItJESMl-R-x-C-G-CLIVIIID-C - 

45 

NAI1E: Nickel-dependent hydrogenases large subunit signature 
E • 

CONSENSUS: EFYJ-D-P-C-IELini-EASGD-C-x (S-.3)-H. 

50 NAME: Glutamy 1-tRNA reductase signature. 

CONSENSUS : H-ELIVM3-X (E)-ELIVn3-EGSTAC3 <3>-ELIVI13-EI>Et23-S- 

E L I V n A 3 - 1 L I V 11 3 ( E ) - E G F 3 - E - 

CONSENSUS : x-E(2R3-EIV3-ELIT3-ISTAG3-(3-ELIVn3-EKR3 . 

55 NAME: Bacterial-type phytoene dehydrogenase signature- 

CONSENSUS : ENG3-x-EFYUV3-ELIVI1F3-x-G-EAGC3-EGS3-ETA3-CH(3T3-P- 

G-ESTAV3-G-ELIVP13- 

CONSENSUS: x(S)-EGS3. 
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NAME: Glycine radical signature- 

CONSENSUS: li:STIV3-x-R-0:iVT3-IECSA3-G-Y-x-B:GACV3 . 

NAME : Ergosterol biosynthesis ERGM/ERG2M family signature 1. 
CONSENSUS: G-x (2 ) -ELI VrO-Y-D-x-EFY J-x-G-x ( 2 ) -L-N-P-R . 

NAME : Ergosterol biosynthesis ERGM/ERG2H family signature 2- 
CONSENSUS: CLIVfU (2 ) -H-R-x (2 >-R-D-x (3 ) -C-x (2 ) -K-Y-G - 

NAME : NNMT/PNMT/TEMT family of methyltransf erases signature- 
CONSENSUS : L-I-D-I-G-S-G-P-T-EIVI-Y-CJ-L-L-S- A-C . 



NAI1E : RNA methyl transf erase trmA family signature 1- 
15 CONSENSUS: EDN3-P-EPA3-R-X-G-X (m-,lb)-ELIVM3 (2 >-Y-x-S-C-N- 

x(2)-T- 

NAME: RNA methyltransf erase trmA family signature 2- 
CONSENSUS: ELIVMF3-D-X-F-P-ECJHY3-EST3-X-H-ELIVMFY3-E.. 

20 

NAME: Thymidylate synthase active site- 

CONSENSUS: R-x (2) -ELIVM3-X (3 ) -EFU3-E(2N3-x ( fi -.1 ) -ELV3-X-P-C- 

ehavm3-x(3)-ei2mt3-efyu3- 
consensus: x-elv3- 

NAME: Ribosomal RNA adenine dimethylases signature- 

CONSENSUS : ELIVM3-ELIVMFY3-EDE3-X-G-ESTAPV3-G-X-EGA3-X- 
ELIVMF3-EST3-x<2)-ELIVM3- 

CONSENSUS: x Cb >-ELIVMY3-x-ESTAGV3-ELIVMFYHC3-E-x-]> - 

NAME: Methylated-DNA — protein-cysteine methyltransf erase 

active site- 

CONSENSUS: ELIVMF3-P-C-H-R-ELIVMF3<2> - 

35 NAME: N-b Adenine-specif ic DNA methylases signature. 

CONSENSUS : ELIVMAC3-ELIVFYblA3-x-EDN3-P-P-EFYLI3 - 



NAME: N- 1 * cytosine-specif ic DNA methylases signature. 
CONSENSUS : ELIVMF3-T-S-P-P-EFY3 • 

NAME: C-S cytosine-specif ic DNA methylases active site- 
CONSENSUS: EDENKS3-X-EFLIV3-X ( 2 ) -EGSTC3-X-P-C-X (2) -EFYULIM3- 

S- 



45 NAME: C-5 cytosine-specif ic DNA methylases C-terminal 

signature • 

CONSENSUS: ERK(2GTF3-x (2 ) -G-N-ESTAG3-ELIVMF3-X ( 3) -ELIVMT3- 

x(3>-ELIVM3-x<3)-ELIVM3- 

50 NAME: Protei n-L-i soaspartate ( D-aspartate ) 0- 

methyltransf erase signature- 

CONSENSUS : EGSA3-D-G-X (2 ) -G-EFYWV3-X (3) -EAS3-P-EFY3-EDN3-X-I - 

NAME: Uroporphyrin-III C-methyltransf erase signature 1. 

55 CONSENSUS: ELIVM3-EGS3-ESTAL3-G-P-G-X ( 3 ) -ELIVMFYJ-ELIVM3-T- 

ELIVM3-EKRHCG3-EAG3- 

NAME: Uroporphyrin-III C-methyltransf erase signature 2- 
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CONSENSUS: V-x ( 5 ) -ELI3-X ( 2 ) -G-D-x ( 3 ) -EF YU3-EGS3-X ( a ) -ELIVF3- 

x(5-.b)-ELIVI1FYWPACJ- 

CONSENSUS: x-ELIVMYD-x-P-G - 

NAME: ubiE/C0t3S methyl transferase family signature 1- 

CONSENSUS: Y-D-x-fl-N-x ( E ) -ELIVM3-S-X ( 3 ) -H-x ( E ) -111 - 

NAME: ubiE/C0t3S methyl transferase family signature E- 

CONSENSUS: R-V-ELIVf13-K-EPV3-G-G-x-ELIVNF3-x(E) -ELI VFO-E-x-S • 



NAME: Serine hydroxymethy 1 transferase pyr idoxal-phosphate 

attachment site- 

CONSENSUS: EJ>EH 3-ELIVMF Y 3-X-ESTMV3-EGST 3-EST 3 ( E ) -H-K-EST 3- 

ELF3-x-G-EPAC3-ER<23- 
15 CONSENSUS: EGSA3-EGA3 - 

NAME: Phosphoribosylglycinamide f ormyltransf erase active 

site • 

CONSENSUS : G-X-ESTM3-EI VT3-x-EFYldVl23-EV[1AT3-x-EDEVM3-x- 

20 ELIVI1Y3-l>-x-G-x<E)-ELIVT:i- 
CONSENSUS : x C b > -ELIVI13 - 

NAME: Aspartate and ornithine carbamoyl transf erases 

signature- 

25 CONSENSUS: F-X-EEK3-X-S-EGT3-R-T . 

NAME: Transketolase signature 1- 

C0NSENSUS: R-x ( 3 ) -ELI VMTA3-EDEN<2STHKF3-x ( 5 Ea ) -EGSN3-G-H- 

EPLIVMF3-EGSTA3-x(E)- 
30 CONSENSUS: ELII1C3-EGS3. 

NAME: Transketolase signature E- 

CONSENSUS: G-EDE(2GSA:il-EDN3-G-EPAE{33-EST 3-EHt23-x-EPAGM3- 

ELIVMYAC3-EDEFYU3-x(2)- 
35 CONSENSUS: ESTAPJ-x <E ) -ERGA3 - 

NAME: Transaldolase signature 1- 

C0NSENSUS: EDG3-EIVSA3-T-EST3-N-P-ESTA3-ELI VMF3 < £ > - 

40 NAME: Transaldolase active site- 

CONSENSUS: ELIVM3-X-ELI VrO-K-ELIVM3-EPAS3-x-EST3-x-EDEN<2PAS3- 

G-ELIVM3-X-EAGV3-X- 

CONSENSUS: EdEKRSTJ-x-ELIVrO . 

45 NAME: Acyl transf erases ChoActase / COT / CPT family 

signature 1- 

C0NSENSUS: ELI3-P-X-EL VP3-P-EIVTA3-P-x-ELIVrO-x-EDEN(3AS3- 

EST3-ELIVM3-x(£)-ELY3- 

50 NAME : Acyl transf erases ChoActase / COT / CPT family 

signature E- 

CONSENSUS: R-EFYLD-x-ED A3-EKA3-X ( □ 1 ) -ELIVI1FY3-X-ELIVMF Y J ( E ) - 

x(3)-EDNS3-EGSA3-x CD- 
CONSENSUS: EDE3-EHS3-xC3)-EDE3-EGA3- 



NANE: Thiolases acyl-enzyme intermediate signature- 

CONSENSUS? ELI VM3-ENST3-X ( E ) -C-ESAGLI3-EST3-ESAG3-ELI VMF YNS3- 

x-ESTAG3-ELIVN3-x(b)- 
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CONSENSUS: ELIVMJ. 

NAME: Thiolases signature 2- 

CONSENSUS: N-x <2) -G-G-x-ELIVMJ-ESAJ-x-C-H-P-x-G-x-ESTJ-G - 

5 

NAME: Thiolases active site. 

CONSENSUS: EAGJ-ELIVMAJ-ESTAGLIVMH-ESTAGD-ELIVMAH-C-x-EAGD-x- 
EAGD-x-EAGH-x-ESAGl • 

10 NAME: Chloramphenicol acetyltransf erase active site- 

CONSENSUS: fl-ELIVJ-H-H-ESAJ-x (2) -D-G-EFYJ-H • 

NAME: Hexapeptide-repeat containing-transf erases signature- 

CONSENSUS: ELIV3-EGAED3-X (2 ) -ESTAVJ-x-ELIVU-x (3 ) -ELIVACJ-x- 

15 ELIV3-EGAEl>l-x(5)- 

CONSENSUS: ESTAVRJ-x-ELIVJ-EGAEDH-x ( 2 ) -ESTAVl-x-ELIVJ-x ( 3) - 

ELIVJ- 

NAME: Beta-ketoacyl synthases active site. 

20 CONSENSUS: G-xCD -ELlVMFAP3-x(2) -EAGCJ-C-ESTA J (2)-ESTAG3- 

x(3)-ELIVMF:d. 

NAPIEs Chalcone and stilbene synthases active site- 

CONSENSUS: R-ELIVMFYSH-x-ELIVMH-x-ElJHGJ-x-G-C-EFYNAll-EGAJ-G- 
25 EGAJ-ESTAVJ-x-ELIVMFJ- 
CONSENSUS: ERA3- 

NAME: Myristoyl-CoA : protein N-myr istoy ltransf erase signature 

1. 

30 CONSENSUS: E-I-N-F-L-C-x-H-K - 

NAME: Myristoyl-CoA:protein N-myr istoy ltransf erase signature 
2- 

CONSENSUS: K-F-G-x-G-D-G - 

35 - 

NAME : Gamma-glutamyltranspeptidase signature- 

CONSENSUS: T-ESTAJ-H-x-ESTH-ELIVMAD-x ( 4 ) -G-ESNJ-x-V-ESTAH-x- 

T-x-T-ELIVMJ-ENEH- 

CONSENSUS: x(l,2)-EFY3-G- 

40 

NAME: Transglutaminases active site. 

CONSENSUS: EGTl-CJ-ECAH-lil-V-x-ESA J-EGA3-EIVT3-X (2) -T-x-ELMSCJ- 
R-ECSA3-ELV3-G- 

45 NAME: Phosphorylase pyr idoxal-phosphate attachment site- 

CONSENSUS: E-A-ESC3-G-x-EGS3-x-M-K-x ( 2 ) -ELM3-N - 

NAME: UDP-glycosyltransferases signature- 

CONSENSUS: EFIiO-x ( 2 ) -<2-x (2 ) -ELIVMYAH-ELIMVJ-x ( H ib > -ELVGACJ- 

50 ELVFYA3-ELIVMF3-ESTAGCM3- 

CONSENSUS: EHNOH-ESTAGO-G-x (2 ) -ESTAGJ-x C3) -ESTAGL3-ELIVMFAJ- 

x(M)-EPtJR3-ELIVMT3- 

CONSENSUS: x < 3) -EPA3-X (3 ) -EDES3-E(JEHN3 • 

55 NAME: Purine/pyrimidine phbsphoribosyl transferases 

signature- 

CONSENSUS: ELIVMFYUCTA3-ELIVM3-ELIVMA3-ELIVMFC3-EDE3-D- 
ELIVHSJ-ELIVM J-ESTAVDJ- 
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CONSENSUS: CSTAR3I-CG ACJ-x-CST ARID . 

NAME: Glutamine amidotransf erases class-I active site. 

CONSENSUS: EPASJ-ELI VMFYTID-ELIVMFYJ-G-ELIVMFYID-C-CLIVMFYNJ-G- 

5 x-C(3EH3-x-CLIVI1FA3. 

NAME: Glutamine amidotransf erases class-II active site- 

CONSENSUS : <x ( 0 11) -C-EGSJ-EIV J-ELIVMFYIO-EAGID - 

10 NAME: Purine and other phosphory lases family 1 signature- 

CONSENSUS: EGSTJ-x-G-ELIVMJ-G-x-EPAID-S-x-IEGST AID-I-x (3) -E-L . 

NAME: Purine and other phosphory 1 ases family 2 signature- 

CONSENSUS : ELIV3-X ( 3 ) -G-x (2 ) - H-x-ELIVMFYJ-x ( *4 ) -ELIVMF3-X (3 ) - 

15 IEATVJ-x(l-.B)-ELIVrini-x- 

CONSENSUS: EATVJ-x ( l 4 ) -EGN2~x (3 1 M ) -ELIVMF J (2) -x (2 >-ESTNJ-ESA2- 

x-G-EGSl-CLIVrO- 

NAME: Thymidine and pyr imidine-nucleoside phosphorylases 

20 signature. 

CONSENSUS: S-EGS3-R-EGAIB-ELI VJ-x ( 2 ) -ETAID-EGA J-G-T-x-D-x- 

ELIVJ-E. 

NAME: ATP phosphoribosyl transferase signature. 

25 CONSENSUS: E-x ( 5) -G-x-CSAGl-x (2 ) -EIVJ-x-D-ELIV3-x< 2 > -EST3-G- 

x-T-ELMJ. 



30 



50 



55 



NAME: NAD:arginine ADP-ribosyltransferases signature. 
CONSENSUS: EFY3-X-EF YJ-K-x ( 2 ) -H-EF Y ID-x-L-EST 2-x-A • 

NAME: Prolipoprotein diacy lglyceryl transferase signature. 
CONSENSUS: G-R-X-EGA3-N-F-ELIVMF J-N-x-E-x < 2 ) -G . 



NAME: S-adenosylmethionine synthetase signature 1- 

35 CONSENSUS: G-A-G-D-fl-G-x (3) -G-Y . 

NAME: S-adenosylmethionine synthetase signature 2. 

CONSENSUS: G-EGAJ-G-EASC3-F-S-X-K-EDE3 . 

40 NAME: Polyprenyl synthetases signature 1- 

CONSENSUS: ELIVMH ( 2 ) -x-D-D-x (2 -. M ) -D-x ( M ) -R-R-EGHID . 

NAME: Polyprenyl synthetases signature H- 

CONSENSUS: ELIVMFYJ-G-x ( 2 ) -EFYL3-(2-ELIVM3-x-D-I>-ELIVMF Y3-x- 

45 EDNG1 • 

NAME: Squalene and phytoene synthases signature 1- 

CONSENSUS: Y-ECSAMJ-x (2>-EVSG:B-A-EGSA J-ELI VATJ-EIVJ-G-x C2)- 

ELMSC3-x(2)-ELIV3- 



NAME: Squalene and phytoene synthases signature 2- 

CONSENSUS: ELIVM3-6-X (3 ) -fl-x (2-, 3 ) -N-EIFl-x-R-D-ELIVMF Y3-x ( 2) - 

EDE3-x(M-,7)-R-x-EFY3- 
CONSENSUS: x-P- 

NAME: Protein prenyltransf erases alpha subunit repeat 

signature. 
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CONSENSUS: EPSIAV3-x-ENI>FV3-ENE(2IY3-x-ELIVMAGP3-liJ-EN<2STHF3- 
EFYH<0-ELIVMR3- 

NAME : Riboflavin synthase alpha chain family signature- 

5 CONSENSUS: ELIVMF3-x(5) -G-ESTADN(23-EKRE(2IYIi!3-V-N-ELIVM3-E - 

NAI1E: Dihydropteroate synthase signature 1- 

CONSENSUS : ELIVM3-X-EAG3-ELIVMF3 (2) -N-x-T-x-D-S-F-x-I>-x-ESG3 - 

10 NAME: Dihydropteroate synthase signature 2- 

CONSENSUS: EGE3-ESA3-X-ELIVM3 (2)-D-ELIVM3-G-EGP3-x (2) -ESTA3- 

x-P. 

NAME: EPSP synthase signature 1- 

15 CONSENSUS: ELIVM3-x<2)-EGN3-N-ESA3-G-T-ESTA3-x-R-x-ELIVMYl-x- 
EGSTA3 • 

NAME: EPSP synthase signature 2- 

CONSENSUS: EKR3-X-EKH3-E-ECST3-EDNE3-R-ELIVM3-X-ESTA3- 
20 ELIVMC3-x(2)-EEN3-ELIVMF3-x- 

CONSENSUS: EKRA3-ELIVMF3-G - 



25 



NAME : FLAP/GST2/LTC14S family signature- 

CONSENSUS: G-x(3)-F-E-R-V-EFY3-x-A-EN(23-x-N-C- 



NAME: Aminotransferases class-I pyridoxal-phosphate 

attachment site. 

CONSENSUS : E6S3-ELIVMFYTAC3-EGSTA3-K-X (2 ) -EGSALVN3-ELIVMFA3- 

X-EGNAR3-X-R-ELIVMA3- 
30 CONSENSUS: EGA3- 

NAME: Aminotransferases class-II pyridoxal-phosphate 

attachment site- 

CONSENSUS : T-ELIVMFYU3-ESTAG3-K-ESAG3-ELIVnFYtilR3-ESAG3-x (2>- 

35 ESAG3- 

NAME: Aminotransferases class-Ill pyridoxal-phosphate 

attachment site- 

CONSENSUS: ELIVMFYUC3 < 2) -x-D-E-ELIVMA3-x (2 ) -EGP3-X (□ -. 1 )- 

40 ELIVMFYUAG3-x(D-.l)-ESACR3-x- 

CONSENSUS: EGSAD3-X ( 12 -.lb ) -D-ELIVMFYUC3-X < 2 -.3) -EGSA3-K-X (3 >- 

EGSTADN3-EGSA3 - 

NAME: Aminotransferases class-IV signature- 

45 CONSENSUS: E-x-ESTAGCI3-x (2 > -N-ELIVMFAC3-EFY3-X (b-,12) - 

ELIVMF3-x-T-x(bifl)-ELIVM3-x- 
CONSENSUS: EGS3-ELIVM3-X-EKR3 - 

NAME: Aminotransferases class-V pyridoxal-phosphate 

50 attachment site- 

CONSENSUS: ELIVFYCHT3-EDGH3-ELIVMFYAC3-ELIVMFYA3-X (2) - 

EGSTAC3-IGSTA3-EHAR3-K- 

CONSENSUS: x <M-.b)-G-x-EGSAT3-x-ELIVMFYSAC3 - 

55 NAME: Hexokinases signature. 

CONSENSUS:- ELIVM3-G-F-ETN3-F-S-EFY3-P-X ( 5) -ELIVM3-EDNST3- 

x(3)-ELIVM3-x(2)-U-T-K-x- 

CONSENSUS: ELF3- 
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NAME : Galactokinase signature - 

CONSENSUS: G-R-X-N-CLIV3-I-G-E-H-X-D-Y . 

5 NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: [LLIVMID-EPO-x-EGSTA J-x ( □ 1 ) -G-L-tGS J-S-S-ICGSA3- 

EGSTAC3 • 

NAME: Phosphof ructokinase signature- 
10 CONSENSUS : (ERKI-x( L 4)-G-H-x-d2-C(3R31-G-G-x(S)-D-R. 

NAME: pfkB family of carbohydrate kinases signature 1 . 

CONSENSUS : EAGID-G-x (□ n 1 ) -EGAPD-x-N-x-IIISTAID-x ( L ) -1EGS3-X ( -G . 

15 NAF1E : pfkB family of carbohydrate kinases signature 2 • 

CONSENSUS : EDNSO-EPSTV3-x-[[SAG3<2)-ILGD3-I>-x<3) -ESAGV3-EAG3- 

ELIVMFY3-ELIVMSTAP3. 

NAME : ROK family signature • 

20 CONSENSUS: ELIVM3-X < 2 > -G-ELIVMFCT 3-G-X-EGA3-ELIVMFA 3-x (fl)-G- 

x(3-.5)-EGATP3-x<2)- 
CONSENSUS : G-ERKH3 - 

NAME : Phosphor ibul ok inase signature • 

25 CONSENSUS: K-ELIVMID-x-R-D-x ( 3) -R-G-X-EST3-X-E . 

NAME: Thymidine kinase cellular-type signature. 

CONSENSUS: EGA3-X ( 1 ->2 ) -EDE3-x-Y-x-ESTAP3-x-C-ENKR3-x-ECH 3- 

ELIVMFYldHI- 

30 

NAME: FGGY family of carbohydrate kinases signature 1- 

CONSENSUS: EMFYGS3-x-EPST3-x(2) -K-ELIVMFYLO-x-td-ICLIVMFl-x- 

OENflTKRJ-EENiahO- 

35 NAME: FGGY family of carbohydrate kinases signature 2- 

CONSENSUS: EGSA3-X-ELI VMFYtiO-x-G-ELI VM3-X ( 7 -, fi ) -EHDENO- 

ELIVMF3-x<2)-EAS3-ESTAIVM3- 
CONSENSUS: ELIVMF Y3-EDE(23 • 

40 NAME: Protein kinases ATP-binding region signature. 

CONSENSUS : ELIV3-G-{P>-G-{P>-EFYUMGSTNH3-ESGA3-{Pld>-ELIVCAT3- 
■CPDJ-x-EGSTACLIVMFYH- 

CONSENSUS: x ( 5 -i Ifl ) -ELI VMFYUCSTAR3-EAI VP3-ELIVMFAGCKR3-K - 

45 NAME: Serine/Threonine protein kinases active-site 

signature • 

CONSENSUS : ELIVMF YC3-x-EHY3-x-D-ELIVMFY3-K-x(2) -N- 

ELIVMFYCT3C3) . 

50 NAME: Tyrosine protein kinases specific active-site 

signature. 

CONSENSUS: ELIVMFYC3-x-EHY3-x-D-ELIVMFY3-ERSTAC3-x ( 2 ) -N- 

ELIVMFYC3(3> - 

55 NAME : Protein kinase domain profile. 

NAME: Casein kinase II regulatory subunit signature- 
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CONSENSUS: C-P-x-ELIVMYD-x-C-x ( S ) -L-P-ELIVMCI-G-x ( T ) -V-EKR])- 

x(2>-C-P-x-C- 

NAME: Pyruvate kinase active site signature- 

5 CONSENSUS: ILIVACU-x-ELIVMD (B)-CSAPCV3-K-ELIV]l-E-IENKRST3-x- 

EDEflHI-EGSTAI-ELIVMH- 

NAME: Shikimate kinase signature- 

CONSENSUS : EKRl-x CS ) -E-x ( 3) -ELIVMFJ^x ( & vlS) -ELIVMF3 (2 ) -CSA3- 

10 x-6(3)-x-CLIVMF3. 

NAME: Prokaryotic diacylglycerol kinase signature- 

CONSENSUS: E-x-ELIVMJ-N-ESTJ-ESAll-ELIVJ-E-x (2 ) -V-D - 

15 NAME: Phosphatidylinositol 3- and M-kinases signature 1- 

CONSENSUS: CLIVHFAC3-K-x(li3)-EI>EA3-EDE3-ELIVt1C3-R-(3-a:DE]l- 
x(4)-<J- 

NAME: Phosphatidylinositol 3- and M-kinases signature 2- 

20 CONSENSUS: CGS3-x-ICAV3-x ( 3) -ELIVMU-x ( 2) -EFYHH-ELIVMD ( 2 ) -x- 

ELIVMFJ-x-D-R-H-x(2)-N- 



25 



NAME: Acetate and butyrate kinases family signature 1- 
CONSENSUS: CLIVM3 (2)-x-ELIVM3-N-x-G-S-EST3-S-x-EKE3 - 

NAME: Acetate and butyrate kinases family signature 2- 
CONSENSUS: ELIVMA3 (2) -x (2 > -H-x-G-x-G-x-ESTl-ELIVMD-x-EAVH- 

x(3)-G. 

30 NAME: Phosphoglycerate kinase signature. 

CONSENSUS: EKRHGTCV3-EVT3-ELIVMF3-ELIVMC3-R-X-D-X-N-ESACV3-P - 



35 
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NAME: Aspartokinase signature- 

CONSENSUS: ELIVMJ-X-K-EFY3-G-G-EST3-ESCJ-ELIVM J - 

NAME: Glutamate 5-kinase signature- 

CONSENSUS : EGSTN3-x(2>-G-x-G-EGC3-EIM3-x-ESTA3-K-ELIVM3-x- 

ESA3-ETCA3-x(2)-EGALV3- 

CONSENSUS: x(3)-G- 

NAME: ATP:guanido phosphotransferases active site- 
CONSENSUS: C-P-x ( Oil) -EST3-N-EIL3-G-T • 



NAME: PTS HPR component histidine phosphorylation site 

45 signature . 

CONSENSUS: G-ELIVM3-H-ESTA3-R-EPA3-EGSTA3-ESTAM3 - 

NAME: PTS HPR component serine phosphorylation site 

signature • 

50 CONSENSUS: EGSADE3-EKRE<3TV3-x CD -EKRN3-S-ELIVMF3 (2 ) -x-ELIVMJ- 

x(2)-ELIVM3-EGAD3- 



NAME: PTS EIIA domains phosphorylation site signature 1- 

CONSENSUS: G-x ( 2 ) -ELIVMFD (3 )-H-ELIVMF3-G-ELIVMF3-x-T-EALV3 . 

NAME: PTS EIIA domains phosphorylation site signature 2- 

CONSENSUS : EDENO-x Cb) -ELIVMF3-EGAJ-X (2)-ELIVMJ-A-ELIVM3-P-H- 

EGAC3- 
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NAME: PTS EIIB domains cysteine phosphorylation site 

signature. 

CONSENSUS: N-ELIVIIFYID-x ( 5 ) -C-x-T-R-ELIVHFID-x-ELIVflFifl-x- 

5 ELIVrO-x-tllDfllD. 

NAME: Adenylate kinase signature- 

CONSENSUS: ELIVPIFYIO (3 ) -D-G-EFYO-P-R-x (3 ) -ENO - 

10 NAME: Nucleoside diphosphate kinases active site- 

CONSENSUS: N-x (2 ) -H-EGAJ-S-D-ESAU-ELIVIIPKNEID - 

NAME: Guanylate kinase signature- 

CONSENSUS: T-ESTJ-R-x ( 2 ) -EKRJ-x ( E ) -OEH-x ( E ) -G-x (2 ) - Y-x-EFYJ- 

15 ELIVNK3 • 

NAME: Guanylate kinase domain profile- 

NAHE: Phosphoribosyl pyrophosphate synthetase signature- 
20 CONSENSUS: D-CLI J-H-ESA J-x-fl-EIflSTI-EtfM J-G-EFYJ-F-x ( 5) -P- 

ELIVMFC3-D. 

NAME: Vna-dihydro-b-hydroxymethylpterin-pyrophosphokinase 
signature - 

25 CONSENSUS: G-EPEJ-R-x ( E ) -D-L-D-ELIVro ( E ) . 

NAME: Bacter iophage-type RNA polymerase family active site 

signature 1- 

CONSENSUS: P-ELIVI13-X ( S ) -D-EGA3-EST J-EAC3-ESN J-EGAI-ELIVIIF YJ- 

30- (2- 

NAME: Bacteriophage-type RNA polymerase family active site 

signature 2. 

CONSENSUS: ELIVUFJ-x-R-x (3)-K-x(£> -CLIVHFH-H-EPT3-X (2) -Y • 



35 



NAME: Eukaryotic RNA polymerase II heptapeptide repeat- 

CONSENSUS: Y-CSTJ-P-EST3-S-P-ESTANK1. 



NAME: RNA polymerases beta chain signature. 

40 CONSENSUS: G-x-K-ELIVIIFAl-ESTACl-CGSTNl-x-EHSTAI-EGS J-E(3NH3- 
K-G-EIVT3. 

NAME: RNA polymerases CI / 15 Kd subunits signature. 

CONSENSUS: F-C-x-OEKST J-C-EGNO-ONSAH-ELIVMI-O-ELIVrO- 
45 x<fl,m)-C-xCH)-C. 

NAME: RNA polymerases D / 3D to M □ Kd subunits signature* 

CONSENSUS: N-ESGA:D-li:LIVnT:]l-R-R.-x( [ l).-li:SA:B-x(3)-V-x(4> -N-x- 
CSTA3-x(3)-IC»N3-E-x-ELI3- 

50 CONSENSUS: EGAJ-x-R-ELIJ-ICGAl-ICLIVIO (S) -P . 

NAME: RNA polymerases H / S3 Kd subunits signature. 

CONSENSUS : H-INEIJ-ELIVII J-V-P-x-H-x ( 2 ) -ELIVrD-x (2) -OE3 . 

55 NAME : RNA polymerases K / 14 to 16 Kd subunits signature- 

CONSENSUS: EST3-x-EFY3-E-x-CATlI-R-x-0[LIVrill-EGSA3-x-R-ESA3-x- 
Q. 
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NAME: RNA polymerases L / 13 to lb Kd subunits signature- 

CONSENSUS: OE3 (2 ) -H-EST3-IELIVM3-IEGAP3-N-X < 11 ) -V-X-ILFM3-X ( 2 ) - 

Y-x(3)-H-P- 

5 NAME: RNA polymerases N / A Kd subunits signature- 

CONSENSUS: ELIVMF3 (S ) -P-ILLIVM3-X-C-F-IEST3-C-G . 

NAME: DNA polymerase family A signature. 

CONSENSUS: R-x (5 ) -EGSAV 3-K-x (3 ) -ELIVMFY3-1E AGO-x (2 ) -Y-x ( 2 ) - 

10 !i:GS3-x<3)-ELIVMA3. 

NAME: DNA polymerase family B signature- 

CONSENSUS: ICYAJ-CGLIVMSTAC 3-D-T-D-£SG3-Ii:LIVMFTC3-x- 

ELIVMSTAC3 • 

15 

NAME: DNA polymerase family X signature- 

CONSENSUS: G-ESGID-ELFYID-x-R-lLGEni-x (3 ) -ESGCL J-x-D-ELIVfO-D- 

ELIVMFY3(3)-x(2)-fl:SAP3. 

20 NAME: Galactose-l-phosphate uridyl transferase family 1 

active site signature- 

CONSENSUS: F-E-N-ERKl-G-x < 3 ) -G-x ( H ) -H-P-H-x-<2 . 

NAME: Galactose-l-phosphate uridyl transferase family 2 
25 signature- 

CONSENSUS: D-L-P-I-V-G-G-EST3-ELIVM3 (2 ) -ESA3-H-EDEN3-H-EFY3- 

(2-G-G. 

NAME: ADP-glucose pyrophosphorylase signature .1- 
30 CONSENSUS: EAG3-G-G-x-G-ESTO-x-L-x ( 2 ) -L-ETAl-x < 3 > - A-x-P- A- 

ELV3- 
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NAME: ADP-glucose pyrophosphorylase signature 2- 

CONSENSUS: W-EFY3-X-G-EST3-A-EDNSH3-EAS3-ELIVMFYIO - 

NAME : ADP-glucose pyrophosphorylase signature 3- 

CONSENSUS: EAPV3-EGS3-M-G-ELIVMN3-Y-EIVC3-ELIVMFY3-X ( 2 ) - 
EDENPHO • 

40 NAME: Phosphatidate cyt idy lyltransf erase signature. 

CONSENSUS : S-X-ELIVMF3-K-R-X < H ) -K-D-X-EGSA3-X ( 2 ) -ELI3-EPG3-X- 

H-G-G-CLIVMJ-x-D-R- 

C0NSENSUS: ELIVMFT3-D • 

45 NAME: Ribonuclease PH signature- 

C0NSENSUS: C-EDE3-ELIVM3 ( 2 ) -(2-EGT A3-D-G-ESG3-X C 2>-ETA3-A . 



50 



NAME : 2 r -5 , -oligoadenylate synthetases signature 1- 
C0NSENSUS: G-G-S-x-EAG3-EKR3-x-T-x-L-EKR3-EGST3-x-S-D-EAG3 . 

NAME: 2 , -5 r -oligoadenylate synthetases signature 2- 
C0NSENSUS: R-P-V-I-L-D-P-X-EDE3-P-T - 



NAME: CDP-alcohol phosphatidyl transferases signature- 

55 CONSENSUS: D-G-x (2) -A-R-x ( & ) -G-x (3 ) -D-x (3 ) -D - 

NAME: PEP-utilizing enzymes phosphorylation site signature. 
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CONSENSUS: G-EG A3-X-ETN3-X-H-EST A3-EST AV3-CLIVf13 ( 2 ) -ESTAV3- 

KRG1 . 

NAI1E : PEP-utilizing enzymes signature E- 

5 CONSENSUS: OE<2S3-x-ELIVnF3-S-ELIVnF3-G-EST3-N-D-ELIVl13-x-<2- 
CLIVI1FYGTJ-ESTALIV3- 

CONSENSUS: ELIVNF3-inGAS3-x(2)-R. 

NAHE: Rhodanese signature 1- 

10 CONSENSUS: EFY3-X (3 ) -H-ELI V3-P-G-A-X (E) -ELIVF3 - 

NAME: Rhodanese C-terminal signature- 

CONSENSUS : EA V3-x ( E ) -IF Y3-EDEAP3-G-EGSA J-EUF 3-x-E-EF Yb)J - 

15 NAME: CoA transferases signature 1- 

CONSENSUS: EDN3-EGN3-X ( E ) -ELIVMFA3 ( 3) -G-G-F-x ( 3) -G-x-P . 

NAME: CoA transferases signature E- 
CONSENSUS: CLF3-EH<23-S-E-N-G-ELIVF3 (E) -EGA3 • 

20 

NAME: - Phospholipase AE histidine active site- 

CONSENSUS: C-C-x C 2 > -H-x ( E) -C - 

NAME: Phospholipase AE aspartic acid active site- 
25 CONSENSUS: ELIVMA J-C-{LIVI1FYIi)PCST}-C-D-x ( S) -C - 

NAME: Lipases-! serine active site- 

CONSENSUS: CLIV3-x-CLIVFYJ-D:LIVMST3-G-fl:HYWV3-S~x-G-CGSTAC3 - 

30 NANE: Colipase signature. 

CONSENSUS: Y -x C £ ) - Y-Y-x-C-x-C - 

NAME: Lipolytic enzymes "G-D-S-L" family-i serine active 
site • 

35 CONSENSUS: ELIVMFYAGKH) -G-D-S-ELIVM3-x(l->E) -ETAG3-G- 

NAME: Lipolytic enzymes "G-D-X-G" family-i putative histidine 

active site- 

CONSENSUS: CLIVMF3 (E) -x-ELIVMF3-H-G-G-IESAG3-EFY3-x (3) - ESTDN3- 

40 x(E)-EST3-H. 

NAME : Lipolytic enzymes "G-D-X-G" family-i putative serine 

active site- 

CONSENSUS: ELIVM3-x-ELIVHF3-ESA3-G-D-S-ECA3-G-EGA3-x-L-ECA3 - 

45 

NAME : Carboxylesterases type-B serine active site- 

CONSENSUS: F-EGR3-G-X C M ) -ELIVI13-x-ELIV3-x-G-x-S-ESTAG3-G - 

NAME: Carboxylesterases type-B signature E- 

50 CONSENSUS: EED3-D-C-L-EYT3-ELIV3-ONS3-ELIV3-(CLIVFYtiO-x- 
EP(3R3- 

NAME: Pectinesterase signature 1- 

CONSENSUS: EGSTNJ-x ( 5 ) -ELIVM3-X-ELIVM3-X ( E ) -G-x-Y-EDNK3-E-x- 

55 ELIVM-x-CLIVrO. 

NAME: Pectinesterase signature E- 

CONSENSUS: G-ESTAD3-ELIVMT3-D-F-I-F-G- 
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NAME : Peptidyl-tRNA hydrolase signature 1 - 

CONSENSUS: EFYl-x C E ) -T-R-H-N-x-G-x (S ) -ELIVMFAID ( S ) -EDE3 - 

NAME: Peptidyl-tRNA hydrolase signature E- 
CONSENSUS: EGSJ-x ( 3 ) -H-N-G-ELI VMID-EKRI-EDNSJ-ELI VMT3 . 

NAME: Alkaline phosphatase active site- 
CONSENSUS: EIVJ-x-D-S-EGASl-EGASCJ-EGASTl-EGAJ-T . 

NAME* Histidine acid phosphatases phosphohistidine 
signature • 

CONSENSUS: ELIVMD-x (E ) -ELIVMA 3-x ( S ) -ELIVM3-x-R-H-EGN3-x-R-x- 

EPAS3 • 

NAME: Histidine acid phosphatases active site signature. 
CONSENSUS: ELIVMFJ-x-ELIVMF AG3-X ( E ) -ESTAGIID-H-D-ESTANO-x- 

ELIVMJ-xC£)-ELIVMFYJ-x (E)- 
CONSENSUS: ESTA3 • 

NAME: Class A bacterial acid phosphatases signature. 
CONSENSUS: G-S- Y-P-S-G-H-T . 



NAME : 5 r -nucleotidase signature 1- 
25 CONSENSUS: ELIVM J-x-ELIVM 1 (S) -EHEA J-ETI J-x-D-x-H-EGSAl-x- 

CLIVMFJ. 

NAME: 5 9 -nucleotidase signature E. 
CONSENSUS: EFYP3-X (M)-ELIVrO-G-N-H-E-F-OND- 

30 

NAME: Fructose-l-b-bisphosphatase active site- 
CONSENSUS : E AGID-ERKl-L-x ( 1 E ) -ELIVID-EFYIB-E-x ( E ) -P-ELI 

EGSA3- 

35 NAME: Serine/threonine specific protein phosphatases 
signature . 

CONSENSUS: ELIVMJ-R-G-N-H-E - 

NAME: Protein phosphatase EA regulatory subunit PRSS 
40 signature 1- 

CONSENSUS: E-F-D- Y-L-K-S-L-E-I-E-E-K-I-N • 

NAME: Protein phosphatase SA regulatory subunit PR55 

signature E- 

45 CONSENSUS: N-EAG J-H-ETA3-Y-H-I-N-S-I-S-ELIVM J-N-S-D - 

NAME: Protein phosphatase EC signature- 

CONSENSUS: ELIVMFYJ-ELI VMFYA3-EGSAC J-ELI VM3-EFYO-D-G-H- 



EGAV3 • 

NAME: Tyrosine specific protein phosphatases active site- 
CONSENSUS: ELI VMF J-H-C-x ( S ) -G-x ( 3 ) -ESTC3-ESTAGP3-X-ELIVMFY3 ■ 

NAME : Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile. 

NAME: PTP type protein phosphatase profile- 
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NAME: Inositol monophosphatase family signature 1- 

CONSENSUS : EFWV3-X ( D 1 ) -ELIVM3-D-P-ELI VM3-D-ESG3-EST3-X ( B ) - 

EFYJ-X-EHKRNSTY3- 

5 

NAME: Inositol monophosphatase family signature B- 

CONSENSUS: EWV3-D-x-EAC3-EGSA:D-EGSAPV:B-x-ELIVACP3-ELIV3- 
ELIVAO-x(3)-EGH:D-EGA3. 

10 NAME: Prokaryotic zinc-dependent phosphol ipase C signature- 

CONSENSUS: H-Y-X-EGT3-D-ELIVM3-EDNS3-X-P-X-H-EPA3-X-N - 

NAME: Phosphatidylinosi tol-specif ic phospholipase X-box 

domain profile. 



15 



35 



50 



NAME: Phosphatidylinositol-specif ic phospholipase Y-box 

domain profile. 



NAME: 3 r 5 r -cyclic nucleotide phosphodiesterases signature. 
20 CONSENSUS: H-D-ELIVMFY3-x-H-x-EAG3-x(B)-EN<23-x-ELIVMFY3- 

NAME: cAMP phosphodiesterases class-II signature- 
CONSENSUS: H-X-H-L-D-H-ELIVM3-X-EGS3-ELIVMA3-ELIVM3 ( B ) -x-S- 

EAP3 • 

25 

NAME: Sulfatases signature 1- 

CONSENSUS : ESAP3-ELI VMST:Q-ECS3-ESTAC3-P-ESTA3-R-x ( B ) - 

ELIVMFW3(B)-ETR3-G. 

30 NAME: Sulfatases signature B - 

CONSENSUS: G-E YV3-X-EST3-X ( B ) -EIVA3-G-K-X ( □ -.1 ) -EFYUK3-EHL3 . 

NAME: AP endonucl eases family 1 signature 1- 
CONSENSUS: E APF3-D-ELI VMF3 (B ) -x-ELIVM3-<2-E-x-K - 



NAME: AP endonucl eases family 1 signature B- 

CONSENSUS : D-EST3-EFY3-R-EKH3-X (?-ifl ) -EFYIO-EST J-EFYUO (B) • 



NAME: AP endonucl eases family 1 signature 3- 
40 CONSENSUS: N-x-G-x-R-ELIVM3-D-ELIVMFYH3-x-ELV3-x-S . 

NAME: AP endonucleases family B signature 1- 
C0NSENSUS: H-x CB)-Y-ELIVMF3-EIM3-N-ELIVMCA3-EAG3. 

45 NAME: AP endonucleases family 3 signature B- 
CONSENSUS: EGR3-ELIVMF3-C-ELIVM3-D-T-C-H • 



NAME: AP endonucleases family B signature 3- 

CONSENSUS: ELIVMU3-H-x-N-EDE3-ESA3-K-x(3>-G-ESA3-x (B)-D. 

NAME: Deoxyribonuclease I signature 1- 

CONSENSUS: ELIVM3 ( B ) -EAP 3-L-H-ESTA3 ( B ) -P-x ( 5) -E-ELIVM3-EDN 3- 

x-L-x-EJ>E3-V- 

55 NAME : Deoxyribonuclease I signature B • 

CONSENSUS: G-D-F-N-A-X-C-ESA3- 

NAME: Endonuclease III iron-sulfur binding region signature. 
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CONSENSUS: C-x (3 ) -EKRS3-P-EKRAGL3-C-X (2 ) -C-x ( S > -C - 

NAME: Endonuclease III family signature. 

CONSENSUS: EGST3-x-ELIVMF3-P-x ( S) -ELIVMU3-X (S -. 3> -ELI3-EPAS3 

5 G-V-EGA3-x(3)-EGAC3- 

CONSENSUS: x(3)-ELIVM3-x(2) -ESALV3-ELIVnFYIi)3-EGANK3 • 

NAME: Ribonuclease II family signature- 

CONSENSUS : EHI3-EFYE3-EGSTAM3-ELI VM3-X ( H -i 5 > - Y-ESTAL3-X- 

10 EFUVAC3-ETV3-ESA3-P-ELIVMA3- 

CONSENSUS: ER<23-EKR3-EFY3-x-D-x (3 >-EH<33 * 



15 



NAME: Ribonuclease III family signature- 
CONSENSUS: EDE<33-ER<23-ELM3-E-EFYIil3-ELV3-G-]>-ESAR3 * 

NAME: Bacterial Ribonuclease P protein component signature 
CONSENSUS: ELIVMFYS3-X ( 2 ) -A-x ( 2) -R-ENH3-EKRQL3-ELIVM3-EKRA3 

R-X-ELIVMTA3-EKR3* 

20 NAME: Ribonuclease T2 family histidine active site 1- 
CONSENSUS: EFYUIL3-x-ELIVM3-H-G-L-l]J-P . 



25 



35 



NAME: Ribonuclease T2 family histidine active site 2- 
CONSENSUS: ELIVMF3-X (2) -EHDGTY3-EE«23-EFYU3-x-EKR3-H-G-x-C ■ 

NAME: Pancreatic ribonuclease family signature. 
CONSENSUS: C-K-x ( 2 ) -N-T-F - 



NAME: DNA/RNA non-specific endonucleases active site* 

30 CONSENSUS: D-R-G-H-EGIL3-X C3)-A. 

NAME: Thermonuclease family signature 1- 

CONSENSUS: D-G-D-T-ELIVM3-x-ELIVMC3-x ("JilD ) -R-ELIVM3-X (2 ) - 

ELIVM3-D-X-P-E* 



NAME: Thermonuclease family signature 2- 

CONSENSUS: D-EKR3-Y-EG<33-R-x-ELV3-EGA3-x-EIV3-EFYb)3 * 



NAME: Beta-amylase active site 1- 

40 CONSENSUS: H-x-C-G-G-N- V-G-D • 

NAME: Beta-amylase active site 2. 

CONSENSUS: G-X-ESA3-G-E-ELI VM3-R-Y-P-S-Y - 

45 NAME: Glucoamylase active site region signature* 

CONSENSUS: ESTN3-EGP3-X ( 1 -. 2 ) -EDE3-X-W-E-E-X (2 ) -EGS3 * 

NAME: Polygalacturonase active site* 

CONSENSUS: CGSDENKRH3-x(2)-EVMFC3-x(2)-CGS3-H-G-ELIVMAG3- 
50 x<l,2>-ELIVM3-6-S* 

NAME: Clostridium cellulosome enzymes repeated domain 
signature • 

CONSENSUS: D-ELIVMFY3-EDNV3-X-EDNS3-X (2)-ELIVM3-EDN3-ESALM3 
55 x-D-x(3)-ELIVMF3-x- 

CONSENSUS: ERKS3-X-ELIVMF3 * 

NAME: Chitinases family IS active site- 
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CONSENSUS: ELIVMFY3-EDN J-G-ELIVMF3-EDN3-ELIVMF J-EDN3-X-E ■ 

NAME: Chitinases family 11 signature !• 

CONSENSUS: C-x (M -,5 ) -F-Y-EST3-X ( 3 ) -EFY3-ELIVMF3-x-A-x (3 ) -EYF3- 

x(2>-F-EGSA3- 

NAME: Chitinases family 11 signature 2- 

CONSENSUS: ELIVM3-EGSA3-F-X-ESTAG3 ( 2 ) -ELIVMFY3-h)-EFY3-U- 

ELIVM3. 

NAME: Alpha-lactalbumin / lysozyme C signature- 
CONSENSUS: C-x (3>-C-x(2) -ELMF3-X (3)-EDEN3-ELI3-x(5>-C' 



NAME: Alpha-galactosidase signature- 

15 CONSENSUS: G-ELIVMFY3-X (2 ) -ELIVMFY3-X-ELIVM3-D-D-X-U-X ( 3-. 1 ) - 

R-EDNSFJ. 



NAME: Trehalase signature 1- 
CONSENSUS : P-G-G-R-F-x-E-x-Y-x-lil-D-x-Y . 

NAME: Trehalase signature 2- 
CONSENSUS: <2-U-D-x-P-x-EGA3-U-EPA3-P . 



NAME: Alpha-L-f ucosidase putative active site. 

25 CONSENSUS: P-x (2 >-L-x ( 3 ) -K-U-E-x-C • 



NAME: Glycosyl hydrolases family 1 active site- 

CONSENSUS: " ELIVMFSTC3-ELIVFYS3-ELIV J-ELIVMSTS-E-N-G- 
ELIVMFARS-ECSAGNI - 

NAME: Glycosyl hydrolases family 1 N-terminal signature. 

CONSENSUS : F-x-EFYUIM3-EGSTA3-x-EGSTA3-x-EGSTA3 (2 )-EFYNH3- 

ENflJ-x-E-x-EGSTAJ. 

35 NAME: Glycosyl hydrolases family 2 signature 1- 

CONSENSUS: N-X-ELIVMFYUD3-R-ESTACN3 ( 2 ) -H-Y-P-x (M)- 

ELIVMFYU3(2>-x(3)-EDN3-x<2>- 

CONSENSUS: G-ELIVMFYIO ( H ) - 

40 NAME: Glycosyl hydrolases family 2 acid/base catalyst. 

CONSENSUS: EDEN0F3-EKRVIO-N-H-EAP3-ESAC3-ELIVMF3 (3 ) -U-EGS3- 

x(2,3)-N-E. 

NAME: Glycosyl hydrolases family 3 active site- 

45 CONSENSUS: ELIVM3 (2 ) - EKR3-x-EE«2K3-x ( 1 ) -G-ELIVMFT3-ELIVT3- 

ELIVMF3-EST3-D-X (2)- 
C0NSENSUS: ESGADNI3 • 

NAME: Glycosyl hydrolases family 5 signature- 

50 CONSENSUS: CLIV3-ELIVMFYUGA3 (2) -EDNE(2G3-ELIVMGST3-x-N-E-EPVJ- 

ERHDNSTLIVFY3 • 



NAME: Glycosyl hydrolases family b signature 1. 

CONSENSUS: V-x-Y-x ( 2) -P-X-R-D-C-EGSAF3-X ( 2 ) -EGSA3 (2) -x-G . 

NAME: Glycosyl hydrolases family b signature 2- 

CONSENSUS : ELIVMYA3-ELI VA3-ELIVT3-ELIV3-E-P-D-ESAL3-ELI3- 

EPSAG3. 
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NAME: Glycosyl hydrolases family A signature- 

CONSENSUS: A-ESTH-D-EAGJ-D-x (2 ) -EIMJ-A-x-ESAJ-ELIVMl-ELIVMG:]) 

x-A-x(3)-EF(iO. 

5 

NAME: Glycosyl hydrolases family 1 active sites signature 1 

CONSENSUS: ESTVJ-x-ELI VMFYJ-ESTVJ-x (2) -G-x-ENKRJ-x <M )- 

EPLIVMH-H-x-R- 

10 NAME: Glycosyl hydrolases family 1 active sites signature 2 

CONSENSUS: EFYIO-x-D-x ( M > -EFYtO-x ( 3 ) -E-x-ESTA3-x ( 3 ) -N-ESTAJ . 

NAME: Glycosyl hydrolases family 10 active site- 
CONSENSUS : EGTAH-x (2) -ELIVNJ-x-EIVMFH-ESTl-E-ELIYI-EDND- 

15 ELIVMFJ • 

NAME: Glycosyl hydrolases family 11 active site signature 1 
CONSENSUS: EPSAJ-EL<2U-x-E-Y-Y-ELIVMID (2 ) -EDE3-X-EFYWHN3 • 

20 NAME: Glycosyl hydrolases family 11 active site signature 2 
CONSENSUS : ELIVMFJ-x(2)-E-EAG3-EYUGl-0:(aRFGS3-ESG3-CSTAN31-G-x 
CSAF3. 

NAME: Glycosyl hydrolases family lb active sites- 
25 CONSENSUS: E-CLIV3-D-ELIV3-X (0 iD-E-x (2)-EGtJ3-EKRNF3-x- 

EPSTA3 - 

NAME: Glycosyl hydrolases family 1? signature* 
CONSENSUS : ELIVMl-x-ELIVMFYUAH (3 ) -ESTAGl-E-ESTAS-G-W-P-ESTNl 

30 x-ISAG(33. 

NAME : Glycosyl hydrolases family 25 active sites signature. 
CONSENSUS: D-ELIVM3-X ( 3) -ENO-EPGJ-x C=l ,10 ) -G-x ( H ) - 

ELIVMFY3(2)-K-x-ESTH-E-EGS3-x(2)- 
35 CONSENSUS: Y-x-EDNH. 

NAME: Glycosyl hydrolases family 31 active site- 

CONSENSUS : EGF3-ELIVMF3-W-X-D-M-ENSA3-E - 

40 NAME: Glycosyl hydrolases family 31 signature 2- 

CONSENSUS: G-EAVH-D-ELIVMTJ-C-G-EFYJ-x (3 ) -ESTH-x (3 ) -L-C-x-R- 

b)-x(2)-ELV3-EGS3-ESAl- 

CONSENSUS: F-x-P-F-x-R-EDNJ - 

45 NAME: Glycosyl hydrolases family 32 active site- 

CONSENSUS: H-x (2 ) -P-x ( 4 ) -ELIVM3-N-D-P-N-G • 



50 



NAME: Glycosyl hydrolases family 35 putative active site. 
CONSENSUS: G-G-P-ELIVM3 (2) -x (2 ) -fi-x-E-N-E-EFYJ - 

NAME: Glycosyl hydrolases family 3T active site- 
CONSENSUS: UJ-x-F-E-x-U-N-E-P-EDNl . 



NAME: Glycosyl hydrolases family M5 active site. 

55 CONSENSUS: ESTAJ-T-R-Y-EFYtO-D-x ( 5 ) -ECA3 - 

NAME: Prokaryotic transglycosylases signature. 
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CONSENSUS: ELIVM3-X ( 3 ) -E-S-x ( 3 ) -E AP3-X ( 3 ) -S-x ( 5 ) -G-ELI VM3- 

ELIVMFYbU-x-QILIVMFYbO- 

CONSENSUS: x ( M ) -ILSAGD - 

5 NAME: Inosine-uridine preferring nucleoside hydrolase family 

signature. 

CONSENSUS : D-x-D-EPT3-EGA3-x-D-D-ETAV3-ICVI3-A • 

NAME : Alkylbase DNA glycosidases alkA family signature- 

10 CONSENSUS : G-I-G-x-U-IESTID-E AV3-X-ELIVMFY3 ( 2 ) -x-ELI VM3-X ( a ) - 

l[MF3-x(2)-IEEl>3-I>. 

NAME: Formamidopyrimidine-DNA glycosylase signature- 

CONSENSUS: C-x ( 2 M ) -C-x-EGT Afl3-x-EIV3-x ( 7 ) -R-EGSTAN3-(i:STA3-x- 

15 CFYI3-C-x(2)-C-fl. 

NAME: Uracil-DNA glycosylase signature- 

CONSENSUS: EKR3-ELIV3-IELIVC3-ELIVM3-x-G-E<2I3-D-P-Y - 

20 NAME: S-adenosy 1-L-homocysteine hydrolase signature 1- 

CONSENSUS: ECSJ-N-x-EFYLH-S-EST3-ILl2A3-OEN3-x-EAVll (2 ) -A-A- 

CLIVJ-CSAV2- 

NAME: S-adenosyl-L-homocysteine hydrolase signature 2- 

25 CONSENSUS: G-K-x ( 3) -ELIV3-X-G-Y-G-X-V-G-EKR J-G-x-A . 

NAME: Cytosol aminopeptidase signature- 

CONSENSUS: N-T-D-A-E-G-R-L - 

30 NAME : Aminopeptidase P and proline dipeptidase signature- 

CONSENSUS: EHA3-EGSYR3-ELIVMT3-ESG3-H-X-ELI V3-G-ELIVM3-X- 

EIV3-H-EDEJ- 

NAME: Methionine aminopeptidase subfamily 1 signature- 

35 CONSENSUS:— EMFY3-x-G-H-G-ELIVMC3-EGSH3-x ( 3 ) -H-x ) -ELIVM3-X- 
EHN3-EYWV3- 

NAME: Methionine aminopeptidase subfamily 2 signature- 

CONSENSUS: EDA3-ELIVMY3-X-K-ELI VM3-D-x-G-x-EHtO-ELIVM3-EI>NS3- 

40 G-x(3)-EDN3- 

NAME: Renal dipeptidase active site- 

CONSENSUS: ELIVM3-E-G-EGA3-X ( 2 ) -ELIVMF3-X ( b ) -L-x (3 ) -Y-x ( 2 ) -G- 



45 



ELIVM3-R. 

NAME: Serine carboxypeptidases-i serine active site- 

CONSENSUS: ELIVM3-X-EGTA3-E-S-Y-EAG3-EGS3 - 



• NAME: Serine carboxypeptidases-i histidine active site- 

50 CONSENSUS: ELIVF3-X < 2) -ELIVSTA3-x-EIVPST3-x-EGSDNi2L3-ESAGV3- 

ESG3-H-x-EIVA(23-P-x(3)- 
CONSENSUS: EPSA3- 

NAME: Zinc carboxypeptidases -i zinc-binding region 1 

55 signature- 

CONSENSUS: EPO-x-ELIVMF Y3-X-ELIVMFY3-X < M > -H-ESTAG3-x-E-x- 

ELIVM3-ESTAG3-x(b)-. 

CONSENSUS: ELIVMFYTA3- 
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NAME: Zinc carboxypept idases i zinc-binding region 5 
signature • 

CONSENSUS: H-ESTAG3-X ( 3 ) -ELI VME3-X ( 2 ) -ELIVMFYU 2-P-CF YU3 - 

5 

NAME: Serine proteases-, trypsin familyi histidine active 
site • 

CONSENSUS: CLIVMH.-ICSTS-A-CSTAGS-H-C - 

10 NAME: Serine proteases-i trypsin familyi serine active site- 

CONSENSUS : EDNSTAGC3-EGSTAPIMVi2H3-x ( E ) -G-EDE3-S-G-EGS3- 

ESAPHV3-ELIVMFYUH3- 
CONSENSUS: ELIVMFYSTANflHU. 

15 NAME: Serine proteases-i subtilase family-i aspartic acid 

active site • 

CONSENSUS: ESTAIVI-x-ELIVMF 3-ELIVM3-D-EDSTA3-G-ELIVMFC 3- 

x<2-.3)-EDNH3- 

20 NAME: Serine proteases-i subtilase family-i histidine active 

site- 

CONSENSUS: H-G-ESTM3-X-EVIC3-ESTAGC 3-EGS3-X-ELIVMA3- 

CSTAGCLVJ-CSAGfll. 

25 NAME: Serine proteases-i subtilase familyn serine active 

site. 

CONSENSUS: G-T-S-x-ESA3-x-P-x ( 2 ) -ESTA VC 3-EAG3 - 

NAME: Serine proteasesi Vfl familyn histidine active site. 

30 CONSENSUS: EST3-G-ELIVMF YW3 (3) -EGN3-X C 2) -T-ELIVM3-X-T-X (B)-H- 

NAME: Serine proteases-. Vfi familyn serine active site- 

CONSENSUS: T-x (E ) -EGC J-EN(23-S-G-S-x-ELIVM3-EF Y 3 • 

35 NAME: Serine proteasesi omptin family signature 1- 

CONSENSUS: U-T-D-x-S-x-H-P-x-T . 

NAME: Serine proteasesn omptin family signature 2- 

CONSENSUS: A-G- Y-<2-E-EST3-R-EF YW3-S-EF YtiJ3-ETN3-A-x-G-G-EST3- 

40 Y. 

NAME: Prolyl endopeptidase family serine active site- 

C0NSENSUS: D-x < 3>-A-x (3 ) -ELIVMFYU3-X ( 1M ) -G-x-S-x-G-G- 

ELIVMFYLJ3(2) - 
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NAME: Endopeptidase Clp serine active site- 

CONSENSUS: T-x (2 ) -ELIVMF3-G-X-A-ESAC3-S-EMSA J-EPAG3-ESTA J * 



NAME: Endopeptidase Clp histidine active site. 

50 CONSENSUS: R-x(3)-EEAP3-x(3) -ELTVMFYT3-M-ELI VM3-H-(2-P • 

NAME: ATP-dependent serine proteases-. Ion family-i serine 

active site- 

CONSENSUS: D-G-EPD3-S-A-EGS3-ELIVMCA3-ETA3-ELIVM3. 



NAME: Eukaryotic thiol (cysteine) proteases cysteine active 

site. 

CONSENSUS: <2-x < 3 ) -EGE3-X-C-C YW3-X (2) -ESTAGC3-ESTAGCV J • 
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NAME: Eukaryotic thiol (cysteine) proteases histidine active 

site. 

CONSENSUS: ELIVMGSTANH-x-H-EGSACE J-ELIVMJ-x-ELIVMAT J (B) -G-x- 

5 EGSADNHID- 

NAME: Eukaryotic thiol (cysteine) proteases asparagine 

active site- 

CONSENSUS: EFYCH2-EUIID-ELIVT]]-x-EKR(2AGJ-N-EST]I-li)-x (3)-EFYlO- 

10 G-x(B)-G-ELFYbD- 

CONSENSUS: ELIVIIFYGIII-x-ELIVMFD . 

NAME: Ubiquitin carboxyl-terminal hydrolase family 1 

cysteine active-site- 
15 CONSENSUS: &- x (3) -N-ESA 3-C-G-x (3) -ELIVMKB ) -H-ESA3-ELIVrD- 

ESA3- 

NAME: Ubiquitin carboxyl-terminal hydrolases family B 

signature 1- 

20 CONSENSUS: G-ELIVMFYJ-x (1 n3 ) -EAGC3-EN ASMH-x-C-EFYlO-QILIVMCll- 

ENSTJ-ESACV3-x-ELIVi1S]l- 
CONSENSUS: (2- 

NAME: Ubiquitin carboxyl-terminal hydrolases family B 

25 signature B- 

CONSENSUS: Y-x-L-x-ESAGZD-ELIVMFTH-x (B)-H-x-G-x(MtS) -G-H-Y - 



30 



NAME: Caspase family histidine active site- 
CONSENSUS: H-x ( E -. H ) -ESC3-X ( H ) -ELIVMFJ ( B ) -EST1-H-G - 

NAME : Caspase family cysteine active site. 
CONSENSUS : K-P-K-ELIVHF3 (M)-t3-A-C-ERr3GH-G- 



NAME: Eukaryotic and viral aspartyl proteases active site- 

35 CONSENSUS: ELIVMFGACID-ELIVMT-ADN J-ELIVFSA3-D-EST3-G-ESTAV3- 

ESTAPPEN03-X-ELIVMFSTNC3- 
CONSENSUS: x-ELIVMFGTAJ - 

NAME: Neutral zinc metallopeptidases-* zinc-binding region 

40 signature. 

CONSENSUS: EGSTALIVNJ-x ( B ) -H-E-ELIVMFYlill-CDEHRKP>-H-x- 

ELIVMFYUGSP(2 J • 

NAME: Platrixins cysteine switch- 

45 CONSENSUS: P-R-C-EGNJ-x-P-EDR J-ELIVSAPKtO . 

NAME: Insulinase familyn zinc-binding region signature. 

CONSENSUS: G-x ( fi ^ > -G-x-ESTAIB-H-ELIVnFYJ-ELIVMCID-EDERNID- 

EHRKL3-ELMFAT3-X-ELFSTHJ-X- 
50 CONSENSUS: EGSTANJ-EGST3 • 

// 

AC PSDlQlbn ^ 
55 DE Glycoprotease family signature- 

CONSENSUS: EKRJ-EGSATJ-x ( M ) -EFYUHL 2-ED(3NGK J-x-P-x-ELIVMFYJ- 

x(3)-H-x(B)-EAGI-H- 

CONSENSUS: ELIVM3- 
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NAME: Proteasome A-type subunits signature. 

CONSENSUS: EFYJ-x ( M) -ESTNV3-X-EF YUO-S-P-x-G-ERKhO-x ( E ) 

ELIVMI-OEl-Y-ESADJ-xCE)- 
5 CONSENSUS: ESAGJ- 

NAME: Proteasome B-type subunits signature- 

CONSENSUS: ELIVMAH-EGSAID-ELIVMFJ-x-EFYLVGACID-x (5) -EGSACFYID- 

ELIVMSTAO(3)-IGAC:i!- 
10 CONSENSUS : EGSTACVJ-EDESl-x ( 15 ) -ERO-x ( IE -, 13 ) -G-x ( E ) -EGSTA3- 



15 
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NAME: Signal peptidases I serine active site- 
CONSENSUS : EGSJ-x-S-M-x-EPSJ-EAT J-ELF 1 • 

NAME: Signal peptidases I lysine active site- 
CONSENSUS: K-R-ELIVMSTA3CE) -G-x-EPG3-G-EDEl-x-ELIVM3-x- 

ELIVMFYJ- 



20 NAME: Signal peptidases I signature 3- 

CONSENSUS: ELIVMFYUJ (E ) -x (E) -G-D-ENHI-x (3 ) -ESNDJ-x ( E) -ESGI . 

NAME: Signal peptidases II signature. 

CONSENSUS: EGAF3-EGA3-EGAS3-ELIVM2-EGAS2-N-ELVMFGJ-ELIVMFY3- 
25 D-R-ELIMFA2 . 

NAME : Peptidase family U3E signature • 

CONSENSUS : E-x-F-x (S) -G-ESA3-ELIVrU-C-x ( H ) -G-x-C-x-ELIVMJ-S . 

30 NAME: Amidases signature. 

CONSENSUS : G-EGA3-S-S-EGS J-G-x-EGSAH-EGS A V Yl-x-ELIVM 3-EGS A3- 

xC&O-EGSAJ-x-CGAID-x-D- 

CONSENSUS: x-EGAH-x-S-ELIVMJ-R-x-P-EGSAO- 

35 -NAME: -Asparaginase / glutaminase active site signature 1- 

C0NSENSUS: ELI VM3-X (E ) -T-G-G-T-EIVJ-CAGSJ - 



NAME: Asparaginase / glutaminase active site signature E- 
C0NSENSUS: G-x-ELIVM J-x ( E ) -H-G-T-D-T-ELIVM 1 - 

NAME: Urease nickel ligands signature- 

CONSENSUS: T-EAYJ-EGA J-EGAT3-ELIVM J-D-x-H-ELIVMJ-H-x (3 ) -P . 



NAME: Urease active site- 

45 CONSENSUS: ELIVMD (E) -ECTH-H-EHNH-L-x ( 3 ) -ELIVM J-x ( S ) -D-ELIVMl]- 
x-F-A. 

NAME: ArgE / dapE / ACY1 / CPGE / yscS family signature 1- 

CONSENSUS: ELIV3-EGALMY 3-ELIVMF3-X-EGSA J-H-x-D-ETVID-ESTAVni - 

50 

NAME: ArgE / dapE / ACY1 / CPGE / yscS family signature E- 

CONSENSUS : EGSTAIJ-ESANtf ID-D-x-K-EGSACNJ-x ( E ) -ELIVMAJ-x ( E ) - 
ELIVMFY3-x(m-»17)-ELIVM3- 

CONSENSUS: x-ELIVMF3-ELIVMSTAG3-ELIVMFA2-x ( B ) -EDNGID-E-E-x- 
55 EGSTN2- 

NAME: Dihydroorotase signature 1- 

CONSENSUS : D-ELIVMFYUSAPJ-H-ELI VA2-H-CLIVF J-ERNID-x-EPGNU - 
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NAME: Dihydroorotase signature 2- 
CONSENSUS: EGA3-EST3-D-X-A-P-H-X (H ) -K - 

5 NAME: Beta-lactamase class-A active site- 

CONSENSUS: EFYID-x-ELIVnFYU-x-S-ETVID-x-K-x ( H ) -E AGLMID-x ( 2 ) - 

ELC3 • 

NAME: Beta-lactamase class-C active site- 
10 CONSENSUS : F-E-ELI VM J-G-S-ELIVMGI-ESAill-K - 

NAME: Beta-lactamase class-D active site- 

CONSENSUS: EPAll-x-S-ESTJ-F-K-ELIVJ-EPALl-x-ESTA J-ELI3 - 

15 NAME: Beta-lactamases class B signature 1 - 

CONSENSUS: ELIJ-x-ESTNID-EHN 1-x-H-EGSTAID-D-x ( 2 ) -G-EGPID-x (7 fi ) - 

EGSJ- 

NAME: Beta-lactamases class B signature 5* 
20 CONSENSUS: P-x (3) -ELIVM3 (2) -X-G-X-C-ELIVMF3 (2) -K- 

NAME: Arginase family signature 1- 

CONSENSUS: ELIVMFJ-G-G-x-H-x-ELIVMTJ-ESTA V3-X-EPAG J-x ( 3) - 



25 



50 



EGSTAJ. 

NAME : Arginase family signature 2- 

CONSENSUS: ELIVM3 ( E ) -x-ELIVMFYJ-D-EASU-H-x-D . 



NAME: Arginase family signature 3. 

30 CONSENSUS: ~ ESTni-ELIVMF YJ-D-ELIVMJ-D-x (3 ) -CPAO-x ( 3 ) -P-EGSA2- 
x(7)-G- 

NAME: Adenosine and AMP deaminase signature- 

CONSENSUS : ESAJ-ELIVMID-ENGSJ-EST A3-D-D-P - 

35 -.. 

NAME: Cytidine and deoxycyt idy late deaminases zinc-binding 

region signature- 

CONSENSUS: CCHJ-E AGVJ-E-x (E ) -ELIVMFGATJ-ELIVrO-x ( 17 -,33) -P-C- 

x(2-.fl)-C-x(3)-ELIVMJ. 

40 

NAME: GTP cyclohydrolase I signature 1. 

CONSENSUS : EEN3-ELIVI13 (E)-x (E)-EKR(2N3I-EI>N3-ELIVm-x (3>-EST3- 

x-C-E-H-H- 

45 NAME: GTP cyclohydrolase I signature 2- 

CONSENSUS: ESA2-x-ERO-x-(2-ELIVMJ-<2-E-ERN3-ELI J-ITSN J • 

NAME: Nitrilases / cyanide hydratase signature 1- 

CONSENSUS: G-x ( 2 ) -ELIVMFYID ( E ) -x-EIF3-x-E-x ( E ) -ELIVM J-x-G-Y-P - 

NAME: Nitrilases / cyanide hydratase active site signature- 
CONSENSUS: G-EGAtfJ-x (2 ) -C-EUA J-E-ENHI-x ( 2 ) -EPSTH-ELIVMFYSI-x- 

EKRJ - 

55 NAME: Inorganic pyrophosphatase signature- 

CONSENSUS: D-ESGDN J-D-EPE3-ELIVMF ID-D-ELIVMGAO • 

NAME: Acylphosphatase signature 1. 
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CONSENSUS: ELIVJ-x-G-x- V-d-G- V-x-EFMJ-R - 

NAME: Acylphosphatase signature 2- 

CONSENSUS: G-EFYU3-EA VC U-EKRdAMID-N-x ( 3 ) -G-x- V-x ( 5 ) -G . 

5 

NAME: ATP synthase alpha and beta subunits signature. 
CONSENSUS : P-ESAPJ-ELIVl-EDNHID-x ( 3 ) -S-x-S - 

NAME: ATP synthase gamma subunit signature- 
10 CONSENSUS: EIVJ-T-x-E-x ( 2) -EDEJ-x (3) -G-A-x-ESAKRJ - 

NAME: ATP synthase delta (OSCP) subunit signature- 
CONSENSUS: ELIVMJ-x-ELIVflFYTJ-x ( 3 ) -ELIVMTJ-EDENflO-x ( 2 ) - 

ELIVM3-X-EGSA3-G-ELIVMFYGAJ- 
15 CONSENSUS: x-ELIVMJ-EKRHENi3 J-X-EGSEN3 - 

NAME: ATP synthase a subunit signature. 

CONSENSUS : ESTAGNl-x-ESTAG J-ELIVMF2-R-L-x-ESAGVni-N-ELIVMT!D - 

20 NAME: ATP synthase c subunit signature- 

CONSENSUS : EGSTAJ-R-ENO-P-x C 10 > -ELIVMFYtiD <2)-x < 3 ) -ELIVriFYtO- 

x-EDEl • 

NAME : E1-E2 ATPases phosphorylation site- 
25 CONSENSUS: D-K-T-G-T-ELIIfl-ETIIB - 

NAME : Sodium and potassium ATPases beta subunits signature 
1. 

CONSENSUS: IF YU3-x ( 2 ) -IF YUHD-x-EFYlO-EDNJ-x ( b ) -ELIVMJ-G-R-T- 

30 x(3)-U- 

NAME: Sodium and potassium ATPases beta subunits signature 
2- 

CONSENSUS = ERK3-X ( 2 ) -C-ERKiSWIIB-x ( 5 ) -L-x ( 2 ) -C-ESA3-G - 



35 



45 



NAME: GDA1/CD3T family of nucleoside phosphatases signature- 

CONSENSUS: ELIVMJ-x-G-x (2) -E-G-x-EFYJ-x-EFbO-ELIVAJ-ETAGJ-x- 

N-EHY3- 



40 NAME: Iodothyronine deiodinases active site. 

CONSENSUS : R-P-L- V-x-N-F-G-S-ECA J-T-C-P-x-F - 



NAME: Cutinase-i serine active site- 

C0NSENSUS: P-x-ESTAJ-x-ELIV J-EIVT J-x-EGSJ-G-Y-S-EGJLJ-G - 

NAME: Cutinase-i aspartate and histidine active sites- 
CONSENSUS : C-x ( 3 ) -D-x-EIVI-C-x-G-EGSTl-x (2 ) -ELIVfO-x ( 2 -.3) -H - 



NAME : DDC / GAD / HDC / TyrPC pyr idoxal-phosphate attachment 

50 site- 

CONSENSUS: S-ELIVMF YtO-x ( 5 ) -K-ELIVMFYbJGJ ( 2 ) -x ( 3) -ELIVMFYhD-x- 

ECA3-xC2)-ELIVMFYlil<2 IB- 
CONSENSUS: x(2)-ERK3- 

55 NAME : Orn/Lys/Arg decarboxylases family 1 pyridoxal-P 

attachment site- 

CONSENSUS : ESTAV2-x-S-x-H-K-x ( 2 ) -EGSTANJ ( 2 > -x-ESTAJ-fl- 

ESTAHK2) - 
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NAME: Orn/DAP/Arg decarboxylases family 2 pyridoxal-P 

attachment site- 

CONSENSUS: EFYH-IEPAID-x-K-CSACVll-INHCLFliU-x ( M ) -ELIVMFIB- 

5 ELIVnTA3-x(5)-CLIVnA3-x(3)- 
CONSENSUS: C6TE3- 

NAME: Orn/DAP/Arg decarboxylases family 2 signature 2- 

CONSENSUS: EGSU-x (2 -.b ) -ELIVMSCPJ-x (2) -CLIVMF3-CDNS3I-ELIVI1CA3- 

10 G-G-G-ELIVMFYJ- 

CONSENSUS: EGSTPCEtO . 

NAME: Orotidine S'-phosphate decarboxylase active site- 

CONSENSUS: ELIVnFTAI-IELIVnFJ-x-D-x-K-x^-D-I-inGPJ-x-T- 
15 1ELIVMTA J • 

NAME: Phosphoenolpyruvate carboxylase active site !■ 

CONSENSUS: EVT3-x-T-A-H-P-T-EE(23-x (2) -R-EKRH3 . 

20 NAflE: Phosphoenolpyruvate carboxylase active site 2. 

CONSENSUS : CIV J-M-ELIVI1 D-G-Y-S-D-S-x-K-B-ISTAGl-G . 
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NAME: Phosphoenolpyruvate carboxykinase (GTP) signature 
CONSENSUS: F-P-S-A-C-G-K-T-N • 

NAME: Phosphoenolpyruvate carboxykinase (ATP) signature 
CONSENSUS : L-I-G-D-D-E-H-x-U-x-OEJ-x-G-OV J-x-N . 



NAME: Uroporphyrinogen decarboxylase signature 1- 

30 CONSENSUS: P-x-U-x-fl-R-O- A-G-R • 

NAME: Uroporphyrinogen decarboxylase signature 2- 

CONSENSUS: G-F-ESTAGCV3-ESTAGC3-x-P-EFYIil3-T-ELV3-x(2)-Y-xC2)- 

EAE3-EGO- 

NAME: Indole-3-glycerol phosphate synthase signature- 

CONSENSUS: ELIVnFYJ-ELIVnCl-x-E-ILIVMFYCJ-K-ITKRSPIB-ESTAO-S- 
P-CST3-x(3)-ff:LIVI1FYSTni. 

40 NAI1E: Ribulose bisphosphate carboxylase large chain active 
site* 

CONSENSUS: G-x-ON3-F-x-K-x-D-E - 

NAME: Fructose-bisphosphate aldolase class-I active site- 

45 CONSENSUS: CLIVH3-x-ELIVnFYli)3-E-6-x-ICLS3-L-K-P-ESNni . 

NAME: Fructose-bisphosphate aldolase class-II signature 1- 

CONSENSUS: EFYVIU-x (1 -,3 ) -ELIVMHU-EAPN J-ELIVtU-x (1,H> -ICLIVI13- 
H-x-D-H-CGACHJ. 



NAME: Fructose-bisphosphate aldolase class-II signature 2' 

CONSENSUS: ELIVfU-E-x-E-ELIVrU-G-x ( 2 ) -EGMJ-IEGSTAll-x-E • 



NAME: Malate synthase signature. 

55 CONSENSUS: EKRJ-OENO-H-x ( 2 ) -G-L-N-x-G-x-U-D-Y-ELIVrU-F - 

NAME: Hydroxymethylglutaryl-coenzyme A lyase active site. 

CONSENSUS: S-V-A-G-L-G-G-C-P-Y . 
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NAME: Hydroxymethylglutaryl-coenzyme A synthase active site- 

CONSENSUS: N-x-EDNl-EIVS-E-G-EIVID-D-x ( 2 ) -N-A-C-EFY J-x-G - 

5 NAME: Citrate synthase signature- 

CONSEN^UE: G-EF YAU-EGAJ-H-x-EI VJ-x C 1 -» 2 > -ERKTIB-x (2) -D-EPSJ-R- 

NAME"- Alpha-isopropylmalate and homocitrate synthases 

signature 1- 
10 CONSENSUS: L-R-EDEH]-G-x-(2-x(10)-K. 

NAME: Alpha-isopropylmalate and homocitrate synthases 

signature 2- 

CONSENSUS: ELIVMFbD-x ( 2 ) -H-x-H-EDNJ-D-x-G-x-EGASIB-x-EGASLi:]! . 



15 



50 



NAME: KDPG and KHG aldolases active site- 

CONSENSUS: G-ELIVMJ-x ( 3 ) -E-ELIV J-T-ELF 3-R . 



NAME : KDPG and KHG aldolases Schiff-base forming residue. 

20 CONSENSUS: G-x ( 3 ) -ELI VMF2-K-ELF3-F-P- ESAU- x ( 3 ) -G - 

NAME: Isocitrate lyase signature- - 

CONSENSUS: K-EKRID-C-G-H-ELMC3 1 - 

25 NAME: Beta-eliminating lyases pyridoxal-phosphate attachment 
site- 

CONSENSUS: Y-x-D-x ( 3 ) -M-S-EGAJ-K-K-D-x-ELIVMJC 2 ) -X-ELIVM3-G- 

G - 

30 NAME: DNA photolyases class 1 signature 1. 

CONSENSUS : T-G-x-P-ELI VII 1 C 2 ) -D-A-x-M-ERAJ-x-ELIVMD . 

NAME: DNA photolyases class 1 signature 2- 

C0NSENSUS: EDNJ-R-x-R-ELIVrO (2)-x-ESTAJ (2 ) -F-ELIVMFAJ-x-K-x- 

35 L-x<2-.3)-U-EKRfl3- 

NAME: DNA photolyases class 2 signature 1- 

C0NSENSUS: F-x-E-E-x-ELI VM3 ( 2 ) -R-R-E-L-x (2 ) -N-F - 

40 NAME:. DNA photolyases class 2 signature 2- 

CONSENSUS: G-x-H-D-x (2)-U-x-E-R-x-ELIVMIB-F-G-K-ELIVM:D-R-EFY:D- 

M-N - 

NAME: Eukaryotic-type carbonic anhydrases signature- 

45 CONSENSUS: S-E-H-x-ELIVMJ-x ( M ) -EFYHJ-x ( 2 ) -E-ELIVMJ-H- 

ELIVMFAJ(2) - 



NAME: Prokaryotic-type carbonic anhydrases signature 1- 
C0NSENSUS: C-ESAJ-D-S-R-ELIVMJ-x-EAPJ . 

NAME: Prokaryotic-type carbonic anhydrases signature 2- 
CONSENSUS: EEO-Y-A-ELIVM J-x (2 ) -ELIVMJ-x ( M ) -ELIVMF3 (3 ) -x-G-H- 

x(2)-C-G- 

55 NAME: Fumarate lyases signature- 
CONSENSUS: G-S-x < 2 ) -M-x C2 > -K-x-N - 

NAME : Aconitase family signature 1- 
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CONSENSUS: ELI VM3-x ( 2 ) -EGSACI VM3-X-ELIV3-EGTI V3-ESTP3-C- 

x(D-.l)-T-N-EGSTANI3-x(t»>- 

CONSENSUS: ELIVMA3 • 

5 NAME: Aconitase family signature 2- 

CONSENSUS: G-x ( 2 ) -ELI VldP(23-x ( 3 ) -EGAC3-C-EGSTAM3-ELIMPTA3-C- 

ELIMV3-EGA3- 

NAME: Dihydroxy-acid and b-phosphogluconate dehydratases 

10 signature 1- 

CONSENSUS: c-D-K-x ( 2 ) -P-EGA3-X (3 ) -EGA3 - 

NAME: Dihydroxy-acid and b-phosphogl uconate dehydratases 

signature 2- 

15 CONSENSUS: ESA3-L-ELIVM3-T-D-EGA3-R-ELIVMF3-S-EGA3-EGAV3- 
EST3 . 

NAME: Dehydroquinase class I active site- 

CONSENSUS: D-ELIVM3-EDE3-ELIVN3-X < 15 20 ) -ELIVM3 ( 2 ) -X-ESC3- 

20 ENHY3-H-EDN3 • 

NAME: Dehydroquinase class II signature- 

CONSENSUS : ELIVM3-EN(23-G-P-N-ELV3-x(2)-L-G-x-R-E(3ED3-P-x(2)- 
EFY3-G - 



25 
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NAME: Enolase signature- 

CONSENSUS: ELIV3 C 3 > -K-x-N-r2-I-G-EST3-ELIV3-EST3-EDE3-EST A3 - 



NAME : Serine/ threonine dehydratases pyridoxal- phosphate 

30 attachment site- 

CONSENSUS: EDESHJ-x ( M 5 > -ESTVG3-X-EAS3-EFYI3-K-EDLIFSA3- 

ERVMF3-EG A3-ELIVMGA3 • 

NAME: Enoyl-CoA hydratase/isomerase signature. 

35 CONSENSUS: ELIVMJ-ESTA3-x-ELIVM3-El>EN<3RHSTA3-G-x (3> -EAG3 (3) - 

x(i|)-ELIVMST3-x-ECSTA3- 
CONSENSUS : EDAHP3-ELIVMFY3 - 

NAME: Imidazoleglycerol-phosphate dehydratase signature 1- 

40 CONSENSUS: ELIVMY3-EDE3-X-H-H-X < 2 > -E-x C2> -EGCA3-ELIVM3- 

ESTAC3-ELIVM3 - 



NAME: Imidazoleglycerol-phosphate dehydratase signature 2. 

CONSENSUS: G-x-EDN3-x-H-H-x (2)-E-ESTAGC3-x-EFY3-K- 

NAME: Tryptophan synthase alpha chain signature. 

CONSENSUS : ELIVM3-E-ELIVM3-G-X ( 2 ) -EFYC3-EST3-EDE3-EPA3- 

ELIVMY3-EAGLI3-EDE3-G - 



50 NAME: Tryptophan synthase beta chain pyr idoxal-phosphate 

attachment site- 

CONSENSUS : ELIVM3-X-H-X-G-ESTA3-H-K-X-N - 

NAME: Delta-aminolevulinic acid dehydratase active site- 

55 CONSENSUS: G-X-D-X-ELIVM3 ( 2 ) -EIV3-K-P-EGSA3-X C 2 ) -Y - 

NAME: Urocanase active site- 

C0NSENSUS: F-d-G-L-P-x-R- I-C-U . 
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NAME: Prephenate dehydratase signature 1- 

CONSENSUS: EF YD-x-ELIVMID-x (5) -ELIVMJ-x ( 5) -EDNJ-x ( 5) -T-R-F- 

elivmio-x-elivmi. 

5 

NAME: Prephenate dehydratase signature 2- 

CONSENSUS : ELI VII J-ESTl-EKRJ-ELIVMID-E-ESTID-R-P . 

NAME: Dihydrodipicolinate synthetase signature 1. 

10 CONSENSUS: EGS AID-ELIVMD-ELIVMFYJ-x ( 2 ) -G-EST J-ETG J-G-E- 

EGASNF3-x(b)-EE(23l. 

NAME: Dihydrodipicolinate synthetase signature 2. 

CONSENSUS : Y-EDNS J-ELIVMF 3-P-x (2 ) -EST 3-x ( 3) -ELIVMJ-x (13-.1M)- 

15 ELIVrO-x-CSGAJ-ELIVIlFJ- 

CONSENSUS: K-EDEfiAFJ-ESTACH . 
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NAME: RsuA family of pseudouridine synthase signature* 

CONSENSUS : G-R-L-D-x (2 ) -ESTJ-X-G-ELIVI1F31 ( M ) -ESTl-EDNTIB . 

NAME: Cysteine synthase/cystathionine beta-synthase P- 

phosphate- attachment site* 

CONSENSUS: K-x-E-x(3)-EPAJ-ESTAGCJ-x-S-EIVAP:B-K-x-R-x-ESTAG:B 
x(2)-ELIVI13. 

NAME: Phenylalanine and histidine ammonia-lyases signature - 

CONSENSUS: G-ESTG2-ELIVn3-ESTG3-EAC3-S-G-EI>H3-L-x-P-L«-ESA3- 
x(2)-ESAJ. 

30 NAME: Porphobilinogen deaminase cof actor-binding site. 

CONSENSUS: E-R-x-ELIVMF AJ-x (3 ) -ELIVMFl-x-G-EGSAJ-C-x-EIVTJ-P 

ELIVMF3-EGSA J • 

NAME: Cys/Met metabolism enzymes pyr idoxal-phosphate 

35 attachment site- 

CONSENSUS : EDfl 3-CLI VMFJ-x ( 3 ) -ESTAGO-ESTAGCI J-T-K-EF 

ELIVMFJ-x-G-EHSH-ESGNHID. 

NAME: Glyoxalase I signature 1- 

40 CONSENSUS: EHdID-EIVT J-x-ELIVFY J-x-EIVJ-x ( 5 ) -ESTAD-x ( 2 ) -F- 

EYM3-x<2-.3)-ELMF:D-G-ELMF:D. 

NAME: Glyoxalase I signature 2- 

CONSENSUS : G-ENTKtiO-x (D-iS)-EGA3-ELVFY3-EGH3-H-EIVF3-ECGA3I-x- 

45 ESTAGLJ-x(2)-EDNCJ. 

NAME: Cytochrome c and cl heme lyases signature 1- 

CONSENSUS: H-N-x (2) -N-E-x (2) -W-ENflKRJ-x (H ) -U-E • 

50 NAME: Cytochrome c and cl heme lyases signature 2- 

CONSENSUS: P-F-D-R-H-D-U - 



55 



NAME: Adenylate cyclases class-I signature 1- 

CONSENSUS: E-Y-F-G-ESA J C2) -L-U-x-L- Y-K . 

NAME: Adenylate cyclases class-I signature 2- 

CONSENSUS: Y-R-N-x-U-ENSJ-E-ELIVMll-R-T-L-H-F-x-G . 
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NAME: Guanylate cyclases signature. 

CONSENSUS : G-V-ELI VM-x ( 0 1 ) -G-x ( 5 ) -EFYl-x-ELIVMl-EFYU J-EGSID 

EDNTHKUJ-EDNTJ-EIV}- 

CONSENSUS: EDNTAH-x C 5) -EDEHl . 

5 

NAME: Chorismate synthase signature 1- 

CONSENSUS: G-E-S-H-EGC3-X ( 2 ) -ELIVMID-EGTVU-x-ELIVMD (2 ) -EDE3-G 

x-EPVJ- 

10 NAME : Chorismate synthase signature 2- 

CONSENSUS: EGE3-R-ESA3 ( 2 ) -ESAGl-R-EEVD-ESTD-x ( 2 ) -ERHl-V-x ( 2 ) 

G- 

NAME: Chorismate synthase signature 3- 
15 CONSENSUS: R-ESHID-D-EPSVl-ECSAVJ-x ( M ) -EGAO-x-EIVGSPID-ELIVM 1 

X-E-ESTAH3-ELIVM3- 



20 



NAME: b-pyruvoyl tetrahydropter in synthase signature 1- 
CONSENSUS: C-N-N-x (2) -G-H-G-H-N-Y . 

NAME: b-pyruvoyl tetrahydropter in synthase signature 2- 
CONSENSUS: D-H-K-N-L-D-x-D - 



NAME : Ferrochelatase signature- 

25 CONSENSUS: ELIVMFJ ( 2 ) -x-S-x-H-EGSl-ELIVM 3-P-x ( M -.5) -IE1>EN(3KR3- 

x-G-D-x-Y- 

NAME: Alanine racemase pyr idoxal-phosphate attachment site- 

CONSENSUS: V-x-K-A-EDNU-EGAJ-Y-G-H-G . 

30 

NAME: Aspartate and glutamate racemases signature 1- 

CONSENSUS : EI VAJ-ELIVMIB-x-C-x ( □ 1 ) -N-EST 3-EMS Al-ESTH3r 

ELIVFYSTANK3 • 

35 NAME: Aspartate and glutamate racemases signature 5* 

CONSENSUS : ELIVMJ ( 2 ) -x-EAGJ- C-T-EDEHID-ELIVMFY J-EPNGRS J-x- 

ELIVM3 • 

NAME: Mandelate racemase / muconate lactonizing enzyme 

40 family signature 1- 

C0NSENSUS: A-x-ESAGl ( 2 ) -ELIVMID-EDEH-x-A-x (2) -D-x (2 ) -EGAJ- 

EKR3- 

NAME: Mandelate racemase / muconate lactonizing enzyme 

45 family signature 2- 

C0NSENSUS: G-x <7)-D-x(T>-A-x (1M ) -ELIVMJ-E-EDENdl-P-x CM )- 

EDEN&l - 

NAME: Ribulose-phosphate 3-epimerase family signature 1- 

50 CONSENSUS: ELIVMF3-H-ELIVMFY3-D-ELIVM3-x-D-x(l-.2)-EFY3- 
ELIVM3-X-N-X-ESTAV3. 

NAME : Ribulose-phosphate 3-epimerase family signature 2- 

CONSENSUS: ELI VMAI-x-ELIVMl-M-ESTl-EVSH-x-P-x ( 3) -G-t2-x-F- 

55 x(b)-ENO-ELIVMCIIl. 

NAME: Aldose 1-epimerase putative active site- 

CONSENSUS : ENSl-x-T-N-H-x-Y-EFUO-N-ELIID - 
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NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase 

signature • 

CONSENSUS: EFYIB-x ( 2 ) -ESTCNLV3-X-F-H-ERH J-ELIVMNID-ELIVMID-x ( 2 ) - 

5 F-CLIVMID-x-d-IEAGll-G. 

NAME: Cyclophilin-type peptidy 1-proly 1 cis-trans isomerase 

profile - 

10 NAME: FKBP-type peptidyl-prolyl cis-trans isomerase 

signature !• 

CONSENSUS: ELIVHO-x-Q: YF J-x-EGVL3-x ( 1 2 ) -ELFTl-x (2 ) -G-x ( 3) - 

OEH-CSTAEdO-ESTAN;]) . 

15 NAME: FKBP-type peptidyl-prolyl cis-trans isomerase 

signature 2. 

CONSENSUS : ELIVflFYI-x (2) -EGAID-x (3 iM ) -ELIVMFJ-x ( 2 ) -ELIVMFHO- 

x(2)-G-x(M)-flILIVMFJ- 

CONSENSUS: x (3)-EPS6A(33-x (2 ) -CAG3-CFY3-G - 



20 



35 



45 



NAME: FKBP-type peptidyl-prolyl cis-trans isomerase domain 

profile. 



NAME: PpiC-type peptidyl-prolyl cis-trans isomerase 

25 signature- 

CONSENSUS: F-EGSADEI3-x-ELVA(0-A-x (3 ) -ESTl-x < 3 -. H >-EST<0- 

x(3vS)-CGERni-6-x-ELIVn]- 

CONSENSUS: C6S3- 

30 NAME: Tr iosephosphate isomerase active site- 

CONSENSUS: EAV3-Y-E-P-ELIVM3-U-ESA 3-I-G-T-QIGO . 



NAME : Xylose isomerase signature 1- 
CONSENSUS: ELI3-E-P-K-P-X ( 2 ) -P - 

NAME : Xylose isomerase signature 2- 
CONSENSUS: EFLI-H-D-x-D-ELIV J-x-EPDJ-x-EGDEID . 



NAME: Phosphomannose isomerase type I signature 1- 

40 CONSENSUS: Y-x-D-x-N-H-K-P-E . 



NAME: Phosphomannose isomerase type I signature 2- 

CONSENSUS: H-A- Y-ELIVMID-x-G-x ( 2 ) -ELIVMJ-E-x-M- A-x-S-D-N-x- 

ELIVMJ-R-A-G-x-T-P-K. 

NAME: Phosphoglucose isomerase signature 1- 

C0NSENSUS: CDENSU-x-ELIVMH-G-G-R-EFY J-S-ELIVMTJ-x-ESTAH- 

EPSAC3-ELIVMA3-G - 



50 NAME: Phosphoglucose isomerase signature 2- 

CONSENSUS: EGS J-x-ELIVM J-ELIVMFYlO-x C H > -EFY3-EDN]]-(2-x-G- V-E- 

x(2)-K- 

NAME: Glucosamine/galactosamine-b-phosphate i some rases 
55 signature. 

CONSENSUS : ELIVM3~x(3)-G-x-ELIT3-x-ELIV3-x-ELIVM3-x-G-CLIVri3- 
G-x-OENH-G-H • 
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NAME: Phosphoglycerate mutase family phosphohistidine 

signature- 

CONSENSUS: ELIVM3-x-R-H-G-CEi33-x (3) -N • 

5 NAME: Phosphoglucomutase and phosphomannomutase 

phosphoserine signature* 

CONSENSUS: EGSA3-ELIVM3-x-ELIVM3-EST3-EPGA3-S-H-x-P-x ( >4 ) - 

EGNHE3- 

10 NAME: Methylmalonyl-CoA mutase signature- 

CONSENSUS: R-I-A-R-N-ET(33-x (2) -ELIVMFY3 (2 ) -x-EE£33-E-x ( H ) - 

EKRN3-x(2)-D-P-x-EGSA3- 

CONSENSUS: G-S- 

15 NAME: Terpene synthases signature- 

CONSENSUS: EDE3-G-S-liJ-x-G-x-U-EGA3-ELIVM3-x-EFY3-x-Y-EGA3 - 

NAME: Eukaryotic DNA topoi somerase I active site- 
CONSENSUS: EDEN3-X < b) -EGS3-EIT3-S-K-X < 2 > -Y-ILIVM3-X ( 3) - 

20 ELIVM3- 

NAME: Prokaryotic DNA topoisomerase I active site- 
CONSENSUS : EEi23-x-L-Y-EDEiaT3-x<3-.:i2)-ELI3-EST2-Y-x-R-EST3- 



25 



EDECS3- 

NAME: DNA topoisomerase II signature- 

CONSENSUS: ELIVMA3-x-E-6-EDN3-S-A-x-ESTAG3 - 



NAME: Aminoacyl-transf er RNA synthetases class-I signature- 
30 CONSENSUS: P-x(0-.2)-EGSTAN3-EDEN<2GAPK3-x-ELIVI1FP3-EHT3- 
ELIVMYAC3-G-EHNTG3- 
CONSENSUS: ELIVHFYSTAGPC3 - 

NAME: Aminoacyl-transf er RNA synthetases class-II signature 

35 1- ' - — - 

CONSENSUS: EFYH3-R-X-EDE3-X C 4 -, 12 > -ERH3-X (3) -F-x (3) -EDE3 . 

NAME: Aminoacyl-transf er RNA synthetases class-II signature 

2- 

40 CONSENSUS: EGSTALVF3--CDEN<2HRKP>-EGSTA3-ELIVMF3-EDE3-R- 
ELIVMF3-X-ELIVMSTAG3-ILIVMFY3- 

NAME: WHEP-TRS domain signature. 

CONSENSUS: El2Y3-G-EDNEA3-x-ELIV3-EKR3-x (2 ) -K-x (2) -EKRNG3- 

45 EAS3-x(4>-ELIV3-EDENK3- 

C0NSENSUS: x(2)-EIV3-x<B)-L-xC3)-K:. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family 

signature 1- 

50 CONSENSUS: S-EKR:B-S-G-EGT3-ELIVM3-EGST3-x-EE<O-x(a-,10)-G- 
xC4)-ELIVM3-EGA3-ELIVM:D-G- 
CONSENSUS: G-D- 

NAME: ATP-citrate lyase / succinyl-CoA ligases family active 

55 site- 

CONSENSUS: G-x <H) - A-x ( 4 1? ) -ERi3T3-ELIVMF3-G-H-E AS3-EGH3 • 
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NAME : ATP-citrate lyase / succinyl-CoA ligases family 
signature 3. 

CONSENSUS: G-x-EIV3-x ( E ) -ELIVMFH-x-ENAJ-G-EGAID-G-EL A3-ESTA V3- 

x(M)-D-x-ELIVM3-x(3)- 

CONSENSUS: G-EGRE2 - 

NAME: Glutamine synthetase signature 1- 

CONSENSUS: EFYUL3-D-G-S-S-X ( t, & ) -EDENGSTAO-ESA3-EDE 3-x (2 ) - 

ELIVMFY3 • 

NAME: Glutamine synthetase putative ATP-binding region 
signature- 

CONSENSUS: K-P-ELIVMFYA 3-x ( 3 5) -ENPAT 3-G-EGSTAN3-G-X-H-X ( 3 ) - 

S- 

NAME: Glutamine synthetase class-I adenylation site- 
CONSENSUS: K-ELIVM3-x(5)-ELIVMA3-D-ERK3-EDN3-ELIJ-Y. 



NAME: D-alanine — D-alanine ligase signature !• 
20 CONSENSUS: H-G-x ( E ) -G-E-D-G-x-ELIVMA:B-E<3SA3-EGSA3 . 

NAME: D-alanine — D-alanine ligase signature E - 
CONSENSUS: ELIV3-X ( 3 ) -EGA3-X-EGSAIV3-R-ELIVCA3-D-ELIVMF3 C E ) - 

x(?-.T)-ELI3-x-E- 
25 CONSENSUS: ELIVA J-N-ESTP 3-X-P-EGA3 . 

NAME: SAICAR synthetase signature 1- 

CONSENSUS: ELIVMF3 ( E ) -P-ELIVIIJ-E-x-ELI VMI-ELIVMCAD-R-x ( 3 ) - 



ETAJ-G-S. 

NAME: SAICAR synthetase signature E • 

CONSENSUS : ELIVM3-ELI VM AID-D-x-K-ELIVMF Y J-E-F-G « 



NAME: Folylpolyglutamate synthase signature 1- 

35 CONSENSUS: ELIVMFY3-x-ELIVM3-ESTAG3-G-T-ENK3-G-K-x-EST3-x < 7 > - 

ELIVM3(£)-xC3)-EGSK3. 

NAME: Folylpolyglutamate synthase signature E- 

CONSENSUS: ELIVMF Y3 ( S ) -E-x-G-ELI VM3-EGA 3-G-x ( E ) -D-x-EGST 3-x- 

40 ELIVM3 (E) • 

NAME: Ubiquitin-act ivating enzyme signature 1. 

CONSENSUS : K-A-C-S-G-K-F-x-P . 

45 NAME: Ubiquitin-act ivating enzyme active site* 

CONSENSUS: P-ELIVM3-C-T-ELIVM3-EKRH J-x-EFT 3-P . 

NAME : Ubiqui tin-con jugating enzymes active site. 

CONSENSUS: EFYULSP3-H-EPC3-ENH3-ELIV3-X (3 i M ) -G-x-ELIVH-C- 

50 ELIV3-X-ELIV3 • 

NAME: Formate — tetrahydrof olate ligase signature 1- 

CONSENSUS: G- ELI VM 3 -K- G-G- A- A- G-G-G-Y - 

55 NAME: Formate — tetrahydrof olate ligase signature E - 

CONSENSUS: V-A-T-EIV3-R-A-L-K-X-EHN3-G-G . 

NAME : Adenylosuccinate synthetase GTP-binding site- 
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CONSENSUS: G-U-G-D-E-G-K-G - 

NAME: Adenylosuccinate synthetase active site- 

CONSENSUS: G-I-EGRID-P-x-Y-x <2>-K-x (2)-R. 

5 

NAME: Argininosuccinate synthase signature 1- 

CONSENSUS: A-EF Y3-S-G-G-L-D-T-S - 

NAME: Argininosuccinate synthase signature 2- 

10 CONSENSUS: G-x-T-x-K-G-N-D-x C2 ) -R-F . 

NAME: Phosphoribosylglycinamide synthetase signature* 

CONSENSUS: R-F-G-D-P-E-x-EflMJ . 

IS NAME: Carbamoyl -phosphate synthase subdomain signature 1. 

CONSENSUS: EFYVJ-EPSJ-ELIVMO-ELIVMAIO-ELIVMll-EKRJ-EPSA:)- 
ESTA3-X (3)-ESG3-G-x-EAG3 * 

NAME: Carbamoy 1-phosphate synthase subdomain signature 2- 

20 CONSENSUS: ELIVMFll-ELIMN J-E-ELIVMCAI-N-EPATLIVMl-EKRJ- 

ELIVMSTACJ. 



NAME : ATP-dependent DNA ligase AMP-binding site. 

CONSENSUS: EEDflHl-x-K-x-EDNJ-G-x-R-EGACIVMl - 

NAME: ATP-dependent DNA ligase signature 2. 

CONSENSUS: E-G-ELIVMA H-ELIVMH (2) -EKRJ-x (5 -, fl ) -EYlO-EflNEO- 

x(2-.b)-EKRH3-x(3-.5)-K- 
CONSENSUS: ELIVMFY3-K - 

NAME: NAD-dependent DNA ligase signature 1- 

CONSENSUS: K-ELIVMJ-D-G-ELIVMH-ESAJ-x ( 4 ) -Y-x ( 2 > -G-x-L-x C *4 > - 

EST3-R-G-EDN3-G-x(2)-G- 
CONSENSUS: EDE1-EDENL1 • 

NAME: NAD-dependent DNA ligase signature 2* 

CONSENSUS: EIVJ-G-EKRJ-ESTD-G-x-ELIVMIB-ESTNO-x-EVTJ-x (2 >-L- 

x-EPSl-V* 

40 NAME: RNA 3'-terminal phosphate cyclase signature. 

CONSENSUS : ERHJ-G-x C2) -P-x-G (3 ) -x-ELIVJ * 
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NAME: Lipoate-protein ligase B signature* 

CONSENSUS : R-G-G-x (2 ) -T-EFYlU-H-x (2)-EGH3-fl-x-ELIV3-x-Y. 

NAME: Isopenicillin N synthetase signature 1- 
CONSENSUS: ERO-x-ESTAJ-x ( 2 ) -S-x-C-Y-ESLJ • 



NAME: Isopenicillin N synthetase signature 2. 

50 CONSENSUS: ILIVM3 ( 2) -x-C-G-ESTAD-x (2 ) -ESTAGJ-x (2)-T-x-EDNG3 - 

NAME: Site-specific recombinases active site* 

CONSENSUS: Y-ELIVACl-R-EVA3-S-EST3-x(2)-(2 . 

55 NAME: Site-specific recombinases signature 2- 

CONSENSUS: G-EDE3-X ( 2 ) -ELIVMJ-x (3 ) -ELIVMJ-EDT J-R-ELIVMJ- 

EGSA3 . 
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NAME: Transposasesi Mutator family-* signature- 

CONSENSUS: D-x ( 3 ) -G-ELI VMFID-x ( b ) -ESTA V2-ELIVMF YbJ J-EPTJ-x- 

ESTAV:B-xC2)-Et3R:B-x-C-x(2)~ 
CONSENSUS: H • 

5 

NAME : Transposases-i IS3Q family^ signature- 

CONSENSUS: R-G-x (2) -E-N-x-N-G-ELI VM3 ( 2 ) -R-E(2E 2-ELIVMFY3 ( 5 ) -P 

K - 

10 NAME : Autoinducers synthetases family signature- 

CONSENSUS: ELMFYJ-R-x ( 3 ) -F-x ( 2 ) -EKRJ-x (2) -hl-x-ELIVMID-x ( b T ) - 

E-x-D-x-EFYJ-b- 

NAME: Thiamine pyrophosphate enzymes signature- 

15 CONSENSUS: ELIVMF3-EGS A3-x (5)-P-x(M) -ELIVMFYIiD-x-ELIVMFH-x-G 

D-EGSAJ-EGSACJ. 

NAME : Biotin-requiring enzymes attachment site- 

CONSENSUS: EGNJ-EI>Er3TRID-x-ELIVMFYID-x(2)-ELIVM3)-x-EAIVJ-M-K- 
20 ELMATl-x(3)-ELIVM3-x- 
CONSENSUS: ESAVJ- 

NAME: 2-oxo acid dehydrogenases acyl transferase component 

lipoyl binding site- 
25 CONSENSUS: EGNJ-x (2 > -ELIVFJ-x ( 5 ) -ELIVFCH-x C2)-ELIVFAID-x<3)-K 

ESTAIVJ-ESTAVtfDNJ- 

CONSENSUS: x ( 2 ) -ELIVMFSJ-x ( 5 ) -EGCNID-x-ELIVMFYID - 

NAME: Putative AMP-binding domain signature- 

30 CONSENSUS: ELIVMFYIB-x ( 2 ) -ESTGJ-EST AG J-G-ESTl-ESTEIID-ESGD-x- 

EPASLIVMJ-EKRJ- 
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NAME: Molybdenum cofactor biosynthesis proteins signature 1 
CONSENSUS: ELIVMJ ( 3) -ELITJ ( 2 ) -G-G-T-G-x (H ) -D - 

NAME : Molybdenum cofactor biosynthesis proteins signature 2 
CONSENSUS: S-x-EGS J-x ( 2 ) -D-x ( 5 ) -ELIVWID-x ( ID 12 ) -ELIVID-x ( 2 ) - 

EKRJ-P-G-EKRLJ-P-x (2 ) - 
CONSENSUS: ELIVMF3-EGA J - 

NAME: moaA / nifB / pqqE family signature- 

CONSENSUS : ELIV3-X (3 ) -C-ENPJ-ELI VMFJ-EGRSJ-C-x-EFYMID-C - 



NAME: Radical activating enzymes signature- 

45 CONSENSUS: EGVJ-x-G-x-EKRl-x ( 3 ) -F-x (2) -G-x ( CM ) -C-x (3) -C- 

x(2)-C-x-ENL3. 



NAME: Tpx family signature- 

CONSENSUS : S-x-D-L-P-F-A-x ( 2 ) -EKRJ-EFIO-C - 

NAME: Cytochrome c family heme-binding site signature- 
CONSENSUS: C-CCPlilHF>-CCPUR>-C-H-CCFYU)> - 



NAME: Cytochrome b5 family-i heme-binding domain signature^ 
55 CONSENSUS: EFYJ-ELIVMO-x ( 2 ) -H-P-EGA J-G - 

NAME: Cytochrome b/bb heme-ligand signature- 
CONSENSUS : EDENd J-x ( 3 ) -G-EFYUMfiD-x-ELIVMFl-R-x ( 2 )-H - 
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NAME : Cytochrome b/bb Qo site signature. 

CONSENSUS: P-EDE3-U-EF Y3J-JLLFY3 < 2 ) - 

5 NAME : Cytochrome b55T subunits heme-binding site signature. 

CONSENSUS: ELIVl-x-ESTl-ELIVFJ-R-EF YU3-X ( 2 ) -EI V3-H-ESTGAID- 

ELIVID-ESTGAID-EIVJ-P. 

NAME: Nickel-dependent hydrogenases b-type cytochrome 

10 subunit signature 1- 

CONSENSUS: R-ELIVMFYWJ-x-H-U-ELIVMID-x ( 2 ) -ELIVMFIB-ESTAO- 

ELIVM3-x(2)-L-x-ELIVM:D-T-G- 

NAME: Nickel-dependent hydrogenases b-type cytochrome 

15 subunit signature 2- 

CONSENSUS: ERHl-ESTAJ-ELIVMFYUO-H-ERHll-ELIVMJ-x ( 2 ) -ti)-x- 

ELIVMFll-x(2)-F-x<3)-H. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 

20 1. 

CONSENSUS: R-P-ELIVMTH-x ( 3 ) -ELIVMH-x (b ) -ELI VMWPO-x ( H ) -S- 

x(2)-H-R-x-ESTJ. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 

25 2 • 

CONSENSUS: H-x ( 3 ) -EGAJ-ELIVMTl-R-EHFJ-ELI VflFH-x-EFYUIU-D-x- 

EGVA3. 

NAME : Thioredoxin family active site- 

30 CONSENSUS: ELIVMFJ-ELIVMSTA3-X-ELIVMFYC3-EF YldSTHEJ-x < 2) - 

EFYUGTNl-C-EGATPLVEl- 

CONSENSUS: EPHYUSTAJ-C-x ( b ) -ELIVMF YUT3 - 

NAME: Glutaredoxin active site* 

35 CONSENSUS: ELIVD J-EFYSA3-X ( M ) -C-EPV J-EFYlO-C-x ( 2) -ETAV3- 

x(2n3)-ELIVJ- 

NAME: Type-1 copper (blue) proteins signature- 

CONSENSUS: EGAH-x ( D-.2) -EYSA3-X ( 0 -,1) -EVFY3-X-C-X < 1 1 2 > -EPGIB- 

40 x(D-,l)-H-x(2-.M)-EMfll. 

NAME: 2Fe-2S ferredoxinsi iron-sulfur binding region 

signature. 

CONSENSUS: C— CO— CO-EGA3-CC>-C-EGAST3-CCPDEKRHFYIiJ>-C - 

45 

NAME : Adrenodoxin family-i iron-sulfur binding region 

signature. 

CONSENSUS: C-x (2 ) -ESTAM-x-ESTAM VH-C-ESTAJ-T-C-EHRl . 

50 NAME: l 4Fe-MS f erredoxins i iron-sulfur binding region 

signature. 

CONSENSUS: C-x ( 2) -C-x ( 2 ) -C-x ( 3 ) -C-EPEG3 . 

NAME: High potential iron-sulfur proteins signature. 

55 CONSENSUS: C-x ( t>i T> -ELIVMH-x ( 3 ) -G-EYU3-C-X ( 2 ) -0IFYIO . 

NAME: Rieske iron-sulfur protein signature 1- 

CONSENSUS: C-ETO-H-L-G-C-ELIVT2 . 
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NAME: Rieske iron-sulfur protein signature 2- 

CONSENSUS: C-P-C-H-x-EGSA J . 

5 NAME : Flavodoxin signature. 

CONSENSUS: ELIV J-ELIVF YJ-EFYJ-x-EST 1-x C2 ) -ILAGCIB-x-T-x ( 3 ) -A- 

x(2)-IELIV]I. 

NAME: Rubredoxin signature- 

10 CONSENSUS: ELIVMJ-xO) -U-x-C-P-x-C-EAGDI • 

NAME: Electron transfer flavoprotein alpha-subuni t 

signature • 

CONSENSUS: ELIJ-Y-ELIVMID-EATJ-x-G-CIVJ-ESDID-G-x-EI V J-f2-H- 

15 x(S)-G-x(b)-CIV3-x-A- 
CONSENSUS: [IV1-N- 

NAHE: Electron transfer flavoprotein beta-subunit signature. 

CONSENSUS : EIVAJ-x-EKIO-x ( 2 ) -EDEJ-EGDID-EGDEIB-x (1 -.2 ) -EEO-x- 

20 ELIV:i-x(M)-P-x-ELIVM:B(2)- 
CONSENSUS: ETAC3 • 
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NAME: Vertebrate metal lothioneins signature- 

CONSENSUS: C-x-C-EGSTAPJ-x ( 2 ) -C-x-C-x ( 2 ) -C-x-C-x ( 2 ) -C-x-K - 

NAME: Ferritin iron-binding regions signature 1- 

CONSENSUS: E-x-EKR 3-E-x ( 2 ) -E-EKR J-ELF ZD-ELIVMAID-x (2 ) -<g-N-x-R- 

x-G-R- 

30 NAME: Ferritin iron-binding regions signature 2- 

CONSENSUS: D-x ( 2 ) -lELIVMFID-IESTAC ID-EDHID-F-ELIll-EENJ-x ( 2 ) -EFYID- 

L-xCb)-ELIVMJ-EKNJ- 

NAME: Bacteriof erritin signature* 

35 CONSENSUS: <f1-x-G-x ( 3 ) -V-ELIV3-X (2 ) -ELMID-x ( 3 ) -L-x ( 3 > -L . 

NAME: Transferrins signature 1- 

CONSENSUS: Y-x ( 0-. 1) -EVASl-V-EIVACl-EIVAJ-EIVAID-ERKhn-ERKS 1- 

EGDENSA3- 

40 

NAME: Transferrins signature 2- 

CONSENSUS: Y-x-G- A-EFLJ-EKRHNtO-C-L-x ( 3 M ) -G-OENGJ-V-EGAil]- 

EFYIO - 

45 NAME: Transferrins signature 3- 

CONSENSUS: EDENflD-EYFJ-x-ELY J-L-C-x-EDN J-x ( 5i8 ) -ELIV J-x ( H tS) - 

C-x(2)-A-x(H)-CH(3R3-x- 

CONSENSUS: ELI VMF YtO-ELI VM3 - 

50 NAME : Globins profile. 

NAME: Proto2oan/cyanobacterial globins signature. 

CONSENSUS: F-ELF3-x(5)-G-EPA:D-x(M)-G-EKRAJ-x-ELIVM3-xC3>-H. 

55 NAME : Plant hemoglobins signature- 

CONSENSUS : ESN J-P-x-L-x ( 2 ) -H-A-x ( 3 ) -F . 

NAME: Hemerythrins signature. 
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CONSENSUS: U-L-x-ENflH-H-I-x ( 3 ) -D-F - 

NAME: Arthropod hemocyanins / insect LSPs signature 1- 

CONSENSUS: Y-EFYIO-x-E-D-ELIVMJ-x (2>-N-x (t ) -H-x (3)-P- 

5 

NAME: Arthropod hemocyanins / insect LSPs signature 2- 

CONSENSUS: T-x (2) -R-D-P-x-EFYl-EFYtO . 

NAME: Heavy-metal-associated domain- 

10 CONSENSUS : ELIVN3-X ( 2 ) -ELIVMFAl-x-C-x-ESTAGCDNHiD-C-x ( 3 ) - 

ELIVFGJ-x(3)-ELIV3-x( c 1-.ll)- 
CONSENSUS: EIVAiB-x-ELVFYSJ - 

NAME: ABC transporters family signature- 

15 CONSENSUS: ELIVMFYCll-ESAID-ESAPGLVFYKtfHID-G-EDENtaMliD- 
EKR<3ASPCLIMFU3-EKRN(2ST A VII ID- 
CONSENSUS: CKRACLVMl-CLIVI1FYPAN3-CPHY>-CLIVnFU3-li:SAGCLIVPl- 
-CFYWHP>— CKRHP*- 
CONSENSUS: ELIVMF YUSTA J - 

20 

NAME: Binding-protein-dependent transport systems inner 

membrane comp- sign- 

CONSENSUS: ELIVMFY3-x(fl) -EE(2R2-ESTAGV3-ESTAG3-x ( 3) -G- 

ELIVMFYSTAC2-x<5)-ELIVMFYSTA:B- 
25 CONSENSUS :' x ( 4 ) -ELIVMFY2-EPKRJ - 

NAME: ABC-2 type transport system integral membrane proteins 

signature - 

CONSENSUS : ELIMSTJ-x ( 2 ) -ELIMlO-x ( 2) -ELIMCAJ-EGSTCJ-x-EGSAIVl- 

30 x ( b ) -ELIMGAU-EPGSNtO- 

CONSENSUS: x ( T 12 ) -P-ELIMFT J-x-EHRSYH-x ( 5 ) -ERA 1 - 

NAME: Bacterial extracellular solute-binding proteins-* 

family 1 signature- 
35 CONSENSUS: " EGAP3-ELIVMFA3-ESTAVDN3I-X ( M ) -EGSAVO-ELIVMFYD (2) -Y- 
ENDID-x(3)-ELIVMF3-x- 
CONSENSUS: EKNDE3 • 

NAME: Bacterial extracellular solute-binding proteins^ 

,40 family 3 signature- 

CONSENSUS: G-EFYIL3-EDE J-ELIVMTJ-EDEl-ELIVMFIB-x ( 3 ) -ELIVMA1- 

EVAGC3-x<2)-ELIVMAGNJ. 

NAME: Bacterial extracellular solute-binding proteinsn 

45 family 5 signature- 

CONSENSUS: EAG3-X ( b-. 7 ) -EDNEGID-x ( 2 )-EST AVEJ-ELIVMFYWA3-X- 

ELIVMFYJ-X-ELIVM3-EKR3- 

CONSENSUS : EKRHDE J-EGDN J-ELIVMA J-EKNGSPJ-EFUJ . 

50 NAME: Serum albumin family signature- 

C0NSENSUS: EFYJ-x < b > -C-C-x < 7) -C-ELFYH-x <b ) -ELI VMFYWJ - 



55 



NAME: Transthyretin signature 1- 

CONSENSUS: S-K-C-P-L-M-V-K-V-L-D-EASH- V-R-G - 

NAME: Transthyretin signature 2- 

CONSENSUS: S-P-EFYJ-S-EFY J-S-T-T-A-ELI VM3- V-EST J-x-P . 
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NAC1E: Avidin / Streptavidin family signature. 

CONSENSUS: OENJ-x (2 ) -IKR 3-CSTA 3-x (£> -V-G-x-IDNJ-x-EFUO-T- 

EKR3- 

5 NAME: Eukaryotic cobalamin-binding proteins signature- 
CONSENSUS : CSN3- V-D-T-CGA3-A-ELIVM3-A-X-L-A-ELIVI1F3-T-C - 

NAME: Lipocalin signature. 

CONSENSUS: EDENG3-x-EDEN(2GSTARO-x (D-.E) -CDENflARK:3-ILIVFY3- 

10 -CCP>-G--CC}-U-EFYULRH3-x- 
CONSENSUS: ELIVI1TA3 - 

NAME: Cytosolic fatty-acid binding proteins signature. 

CONSENSUS : EGSAIVK3-x-EFYW3-x-ILIVMF3-x ( M ) -ENHG3-EFY3-EDE3-X- 

15 ELIVI1FY3-CLIV[13-x(E)- 

CONSENSUS : ELIVMAKR3 • 

NAME: Acyl-CoA-binding protein signature- 

CONSENSUS: P-ESTA3-x-CDEN3-x-ELIVriF]l-x(a)-ELIVriFY3-Y-fl:GSTA3- 
20 x-EFY3-K-fl-CSTA3 (E)-x-G- 

NAUE : LBP / BPI / CETP family signature. 

CONSENSUS: EPA3-EGAJ-ELIVMC3-X (E) -R-EIV3-IST3-X (3) -L-x (5) - 

KE<21-x ( 4 ) -ELIVI13-EEr2IC3- 
25 CONSENSUS: x(fl)-P- 

NAME: Phosphatidylethanolamine-binding protein family 
signature. 

CONSENSUS: EFY3-X-ELIVI1F J ( 3 ) -x-EDC3-P-D-x-P-ESNJ-x (ID ) -H - 

30 

NAME: Plant lipid transfer proteins signature- 

CONSENSUS : ELIVt13-EPA3-x ( E ) -C-x-ELIVPO-x-ELIVPO-x-ELIVflFYS-x- 

ELIVI13-EST3-x(3)- 

CONSENSUS: EDN3-C-X (E) -ILIVM3 • 

NAME: Uteroglobin family signature 1- 

CONSENSUS: EGA3-x(3>-I-C-P-x-ELIVI1F3-x(3)-ELIVI13-El>E3-x- 
ELIVMF3 (E) • 

40 NAME: Uteroglobin family signature E- 

CONSENSUS: EDE<23-x (4 ) - ESN3-X ( 5) -EDEtJU-x-I-x (E ) -S-EPSE3-ELS3- 

C- 
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NAME: Mitochondrial energy transfer proteins signature- 

45 CONSENSUS: P-x-EDE3-x-ELIVAT3-ERK3-x-ELRH3-ELIV[1FY3-E(aMAIGV3 - 

NAME: Sugar transport proteins signature 1- 

CONSENSUS: ELIVMSTAG3-ELIVnFSAG3-x(E)-ELIVMSA3-EDE3-x- 
ELIVMFYIi)A3-G-R-ERK3-x(4-,b>- 
50 CONSENSUS: EGSTA3 • 

NAME: Sugar transport proteins signature E- 

CONSENSUS: " ELIVI1F3-X-G-ELIVI1FA3-X (E ) -G-x (3 ) -ELIFY3-X ( £ ) -EE<23- 
x(b>-ERK3- 

55 

NAME: LacY family proton/sugar symporters signature 1. 

CONSENSUS: G-ELIVM3 (E) -x-D-CRO-L-G-L-ERO ( E ) -X-ILIVM3 (S ) -b) . 
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NAME: LacY family proton/sugar symporters signature 2- 

CONSENSUS : P-x-ELIVMF3<2) -N-R-ELIVM3-G-X-K-N-ESTA3-ELIVM3 (3) • 

NAME: PTRS family proton/oligopeptide symporters signature 

5 I- 

CONSENSUS: EGA3-EGAS3-ELIVMFYIi)A3-ELIVM3-IGAS3-]>-x-ELIVMFYIiJT3- 
ELIVMFYbl3-G-x<3>-ETAV3- 

CONSENSUS: EIVI-x C3 ) -CGSTAV3-x-ELIVMF3-x ( 3 ) -EGA3 • 

10 NAME: PTRS family proton/oligopeptide symporters signature 

2- 

CONSENSUS : CFYTJ-x(S)-ELMFYJ-EFYV3-CLIVMFYIilA3-x-EIVG3-N- 
ELIVMAGU-G-CGSAD-ELIIIF J • 

15 NAME: Amiloride-sensitive sodium channels signature* 

CONSENSUS : Y-x (2) -EE(3TF3-x-C-x (2>-EGSTl>NL3-C-x-E<2T3-x (5 ) - 

ELIVMT3-ELIVMS3-x(2)-C-x-C- 

NAME: Sodium : alanine symporter family signature- - 

20 CONSENSUS: G-G-x-EGA3(2)-ELIVM3-F-lil-M-liJ-ELIVM3-x-ESTAV3- 
ELIVMFA3(B)-G- 

NAME: Sodium : dicarboxylate symporter family signature 1- 

CONSENSUS: P-x<0-.l)-G-EDE3-x-ELIVMF3<2)-x-ELIVM3(2)-EKRE<23- 
25 ELIVM3(3>-x-P- 

NAME: Sodium:dicarboxylate symporter family signature 2- 

CONSENSUS: P-x-G-x-ESTA3-x-ENT3-ELIVMC3-D-G-ESTAN3-x-ELIVM3- 
EFY3-x<2)-ELIVM3-x<2>- 
30 CONSENSUS: ELIVn3-EFY3-ELI3-ESA3-tf - 

NAME: Sodium'.galactoside symporter family signature- 

CONSENSUS: D-x (3) -G-x ( 3) -EDN3-X (bifi ) -G-EKH3-F-EKR3-P-EFYU3- 

ELIVM3(2)-x-CGSTA3<2) . 
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NAME: Sodium neurotransmitter symporter family signature 1- 

CONSENSUS: U-R-F-EGP3-Y-X ( M ) -N-G-G-G-X-EFY3 . 



NAME: Sodium:neurotransmitter symporter family signature 2. 

40 CONSENSUS: Y-ELIVMFY3-X (2 > -ESC3-ELIVMFY3-EST<23-x (2) -L-P-U- 

x(2)-C-x(M)-N-EGST3- 

NAME: Sodium : solute symporter family signature 1- 

CONSENSUS : CGS3-X ( 2 ) -ELIY3-X (3 ) -ELIVMFYUSTAG3 ( 10) -ELIY3- 

45 ETAV3-x(2)-G-G-ELMF3-x- 
C0NSENSUS: ESAP3 • 

NAME: Sodium : solute symporter family signature 2- 

CONSENSUS: EGAST3-ELIVM3-X ( 3 ) -EKR3-X ( M ) -G-A-x ( 2 ) -EGAS3- 

50 ELIVMGS3-ELIVMU3-ELIVMGAT3-G- 
C0NSENSUS: X-ELIVMG3 - 

NAME: Sodium : sulf ate symporter family signature- 

CONSENSUS: ESTACP3-S-X (2 ) -F-x (2) -P-ELIVM3-EGSA3-X (3) -N-x- 

55 ELIVM3-V- 

NAME: glpT family of transporters signature- 

CONSENSUS: R-G-x ( 5 ) -Id-N-x ( 2 ) -H-N-x-G-G - 

-454- 



WO 01/98454 



PCT/IB01/02050 



NAME: Ammonium .transporters signature- 

CONSENSUS: D-EF YUS3-A-G-EGSC J-x ( 2 ) -EIVJ-x ( 3 ) -ESAGJ ( 2 ) -x ( 2 ) - 
CSAGJ-CLIVHFJ-xO)- 

5 CONSENSUS: ELIVMFYUA3K2 ) -x-EGKJ-x-R • 

NAME: BCCT family of transporters signature. 

CONSENSUS : EGSDNJ-U-T-ELIVM J-x-EFY3-W-x-U-lil - 

10 NAME: Flagellar motor protein motA family signature- 

CONSENSUS: A-ELMFl-x-EGATD-T-ELIVF J-x-G-x-ELIVMF3-x ( 7 ) -P - 

NAME: Formate and nitrite transporters signature 1- 

CONSENSUS: ELIVM A J-ELIVMYJ-x-G-EGSTAJ-EDESlI-L-EFI J-ETN J-EGS3 - 



15 
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NAME: Formate and nitrite transporters signature 2- 

CONSENSUS: EGA3-X C2 ) -ECAJ-N-ELIVMFYWJ < 2 ) -V-C-ELVJ-A . 



NAME : Prokaryotic sulf ate-binding proteins signature 1- 

20 CONSENSUS: K-x-ENt3EK J-EGTID-G-EDdlD-x-ELIVMl-x ( 3 ) -<3-S - 

NAME: Prokaryotic sulf ate-binding proteins signature 2- 

CONSENSUS: N-P-K-ESTJ-S-G-x-A-R - 

25 NAME: Sulfate transporters signature. 

CONSENSUS : P-x-Y-EGSID-L-Y-ESTAGJ (2) -x < H ) -ELIVMFYID < 3 ) -x < 3 ) - 

EGSTA3(2)-S-EKR3. 

NAME: Amino acid permeases signature- 

30 CONSENSUS: EST AGC3-G-EPAG3-X ( 2 n 3 ) -ELIVMF YUA J ( 2 ) -x-ELI VMFYIO- 

x-ELIVMFUSTAGC3<2)- 

CONSENSUS: EST AGO-x ( 3 ) -ELIVMFYliO-x-ELIVMSTID-x ( 3 ) -ELIMCT A3- 

EGA3~E-x(S)-EPSAL3- 

35 NAME: Aromatic amino acids permeases signature. 

CONSENSUS: I-G-EGAl-G-M-ELFJ-ESAID-x-P-x ( 3 ) -ES AU-G-x ( 2 ) -F - 

NAME : Xanthine/uracil permeases family signature- 

CONSENSUS : ELI VMJ-P-x-EPASIF J- V-ELIVMJ-G-G-x ( M ) -ELIVMJ-EFYD- 

40 EGSA3-x-ELIVM3-x(3)-G. 

NAME: Anion exchangers family signature 1- 

CONSENSUS : F-G-G-ELIVMJ (2 ) -EKRD-D-ELIVMJ-ERKJ-R-R-Y. 

45 NAME: Anion exchangers family signature 2- 

CONSENSUS: EFID-L-I-S-L-I-F-I-Y-E-T-F-x-K-L - 



NAME: MIP family signature. 

CONSENSUS: EHNGA J-x-N-P-ESTAJ-ELIVMFID-EST ID-ELI VMF3-EGSTAFY 1 - 

NAME: General diffusion Gram-negative porins signature- 

CONSENSUS : ELIVMFYJ-x ( 2 ) -G-x ( 2 ) -Y-x-F-x-K-x ( 2 ) -ESN2-ESTAV J- 

ELIVMFYW3-V- 

55 NAME: OmpA-like domain- 

CONSENSUS : ELIVMAJ-x-EGTJ-x-ETAJ-EDAID-x ( 2 ) -EDGID-EGSTPID-x ( 2 ) - 

ELFYDE3-EN<2S3-x(2>- 
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CONSENSUS: ELI3-ESG3-Et2E3-EKR(2E3-R--A-x(2)-ELV3-x(3)-ELIVMF3- 
x(4-,5)-ELIVM3-x<4>- 

CONSENSUS: ELIVM3-X ( 3 ) -ESG3-X-G . 

5 NAME: Eukaryotic mitochondrial porin signature- 

CONSENSUS: IE YHJ-x ( 2 ) -D-ESPA3-X-ESTA3-X (3) -ETAGl-EKRJ-ELI VMF3- 

EDNSTA3-EDNS3-x(4)- 

CONSENSUS: EGSTAN3-ELIVMA3-X-ELIVMY3 - 

10 NAME : Insulin-like growth factor binding proteins signature. 

CONSENSUS: G-C-EGS3-C-C-X (2 ) -C-A-x ( t, ) -C - 
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NAME: GPRl/FUN3M/yaaH family signature- 
CONSENSUS: N-P-EAV3-P-ELF3-G-L-X-EGSA3-F • 

NAME: GNS1/SUR4 family signature- 
CONSENSUS : L-x-F-L-H-x- Y-H-H - 



NAME: 4 3 Kd postsynaptic protein signature- 
20 CONSENSUS: G-fl-D-fl-T-K-fl-fl-I - 

NAME : Act ins signature 1 • 

CONSENSUS : EFY3-ELIV3-G-E])E3-E-A-(2-x-ERK<23 ( 2) -G - 

25 NAME: Actins signature 2- 

CONSENSUS: U-EIV3-ESTA3-ERK3-X-EDE3-Y-EDNE3-EDE3 - 

NAME: Actins and actin-related proteins signature- 
CONSENSUS: ELM 3-ELIVM3-T-E-EGAPtO-x-ELIVMF YWH<23-N-EPSTA<2 3- 

30 x(2)-N-EKR3- 

NAME: Annexins repeated domain signature- 

CONSENSUS: ETG3-ESTV3-X ( fl ) -ELIVMF3-X ( 2 ) -R-x (3 ) -EDEi2NH3-x ( 7 ) - 

EIFY3-xC7)-ELIVMF3- 
35 CONSENSUS: x < 3 ) -ELIVMF3-X C 11 > -ELIVMFA3-X C 2 > -ELIVMF3 - 

NAME: Caveolins signature- 
CONSENSUS: F-E-D-V-I- A-E-P - 

40 NAME: Clathrin light chain signature 1- 
CONSENSUS : F-L-A-<2-t2-E-S - 



NAME: Clathrin light chain signature 2- 

CONSENSUS: EKR3-D-X-S-EKR3-ELIVM3-EKR3-X-ELIVM3 <3>-x-L-K - 

NAME: Clusterin signature 1- 
CONSENSUS: C-K-P-C-L-K-x-T-C - 



NAME: Clusterin signature 2- 
50 CONSENSUS: C-L-ERO-M-ERO-x-EE<23-C-EE])3-K-C. 

NAME: Connexins signature 1- 

CONSENSUS: C-EDN3-T-x-<2-P-G-C-x ( 2 ) - V-C-Y-D . 

55 NAME: Connexins signature 2- 

CONSENSUS: C-x ( 3 -. 4 ) -P-C-x ( 3 ) -ELIVM3-EDEN3-C-EFY3-ELIVM3-ESA3- 

EKR3-P- 
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NAME: Crystallins beta and gamma 'Greek key' motif 
signature . 

CONSENSUS: ELIVMFYblAD-x--CI>EHRKSTP>-EF YJ-EDEtJHKYID-x ( 3 ) -EFYl-x 

G-xC^-ELIVMFCSTl). 

5 

NAME: Dynamin family signature- 

CONSENSUS: L-P-ERO-G-ESTN3-EGN3-ELIVMJ-V-T-R- 

NAME: Dynein light chain type 1 signature- 

10 CONSENSUS: H-x-I-x-G-EKRJ-x-F-EGAJ-S-x-V-ESTJ-EHYll-E- 

NAME: FtsZ protein signature 1- 

CONSENSUS: N-EST3-D-x-(2-x-L-x ( lb-.lfi ) -G-x-G-EATV3-G-EGSAN3-x- 

P-x<2)-C 

NAME: FtsZ protein signature 2- 

CONSENSUS : EDNHKRJ-ELIVMFH-x-ELIVMFJ (2 ) -EVSTAC 3-ESTACJ-G-x-G 

EGO-G-T-G-ESTJ-G- 

CONSENSUS: EGSAR3-ESTA3-P-ELIVMFTJ-ELIVMFID-ESGA V3 - 

NAME: Fungal hydrophobins signature- 

CONSENSUS : EGNID-EDNflPSA J-x-C-EGSTANK J-EGSTADNfllD-ESTNfilJ- 

EPTIVl-x-C-C-EDENflKPSTn). 

25 NAME: Intermediate filaments signature- 

CONSENSUS: EI VJ-x-ETACIID-Y-ERKHIl-x-ELMIB-L-EDE 2 - 
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NAME : Involucrin signature- 

CONSENSUS: <M-S-E(3H3-(2-x-T-ELV J-P-V-T-ELV1 - 

NAME: Kinesin motor domain signature- 

CONSENSUS: EGSAID-EKRHPSTtfVMID-ELI VMF3-X-ELI VMF3-EIVCID-D-L- 

EAHIB-G-ESANID-E. 

35 NAME: Kinesin motor domain profile- 

NAME: Kinesin light chain repeat. 

CONSENSUS: EDEflRJ- A-L-x ( 3 ) -EGEtO-x ( 3 ) -G-x-EDNS3-x-P-x-V-A- 

x(3)-N-x-L-EASJ- 
40 CONSENSUS: x ( 5 ) -EdRJ-x-EKRJ-EFYH-x ( 2 ) -EAV3-X ( M ) -EHKNtfJ - 

NAME: Myelin basic protein signature- 

CONSENSUS: V-V-H-F-F-K-N . 

45 NAME: Myelin PD protein signature. 

CONSENSUS : S-EKRID-S-x-K-E AG J-x-ESAJ-E-K-K-EST AID-K - 

NAME: Myelin proteolipid protein signature 1- 
CONSENSUS: G-EMVJ-A-L-F-C-G-C-G-H - 

50 

NAME: Myelin proteolipid protein signature 2- 

CONSENSUS : C-x-ESTJ-x-EDED-x (3 ) -ESTH-EFYJ-x-L-EFYJ-I-x ( M ) -G- 

A. 

55 NAME: Neuromodulin (GAP-M3) signature 1- 
CONSENSUS: <M-L-C-C-ELI VMJ-R-R - 

NAME: Neuromodulin CGAP-M3) signature 2. 
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CONSENSUS: S-F-R-G-H-I-x-R-K-K-ELIVMIB . 

NAME: Osteopontin signature- 

CONSENSUS: EKtO-x-ETAID-x ( 5 ) -EGAD-S-S-E-E-K . 

5 

NAME: Peripherin / rom-1 signature- 

CONSENSUS: D-EGSHl-V-P-F-ESTH-C-C-N-P-x-S-P-R-P-C- 

NAME: Profilin signature- 
10 CONSENSUS: <x (0 1 ) -ESTAl-x ( 0 -, 1 ) -U-EDENflHU-x-E YIID-x-EDEGD - 

NAME: Surfactant associated polypeptide SP-C palmitoylation 
sites- 

CONSENSUS: I-P-C-C-P-V- 
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NAME: Synapsins signature 1- 

CONSENSUS: L-R-R-R-L-S-D-S - 



NAME: Synapsins signature 5- 

20 CONSENSUS: G-H- A-H-S-G-M-G-K-V-K - 

NAME: Synaptobrevin signature- — 

CONSENSUS : N-ELIVMJ-EDENSll-EKLIB-V-x-EDEdDI-R-x ( 2 ) -EKRJ-ELIVM3- 

ESTDEl-x-ELIVMD-x-EDEJ- 
25 CONSENSUS: EKRJ-ETA J-EDE 1 - 

NAME: Synaptophysin / synaptoporin signature- 

CONSENSUS: L-S- V-EDE3-C-X-N-K-T - 

30 NAME: Tropomyosins signature- 

CONSENSUS: L-K-E- A-E-x-R- A-E - 



NAME: Tubulin subunits alpha-i betai and gamma signature. 
CONSENSUS: ESAG3-G-G-T-G-ESA3-G - 

NAME: Tubulin-beta mRNA autoregulation signal. 
CONSENSUS : <M-R-EDE3-EIL3 - 



NAME: Tau and MAP proteins tubul in-binding domain signature- 

40 CONSENSUS: G-S-x (2) -N-x ( 2 ) -H-x-EPAJ-EAGJ-G (2 ) - 

NAME: Neuraxin and MAP1B proteins repeated region signature. 

CONSENSUS: ESTAGDN3-Y-X- Y-E-x ( 2 ) -EDE J-EKR J-ESTAGCI J . 

45 NAME: F-actin capping protein alpha subunit signature 1- 

CONSENSUS : V-H-EFYll (2 ) -E-D-G-N- V - 



NAME: F-actin capping protein alpha subunit signature 2- 
CONSENSUS: F-K-EAEU-L-R-R-x-L-P - 

NAME: F-actin capping protein beta subunit signature- 
CONSENSUS: C-D-Y-N-R-D- 



NAME: Vinculin family tal in-binding region signature- 

55 CONSENSUS: EKR J-x-ELIVMF J-x (3 ) -ELIVMAH-x (2 ) -ELIVM3-X ( b ) -R-<2- 

(2-E-L. 

NAME: Vinculin repeated domain signature- 
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CONSENSUS: ELIVM3-x-E(2A3-A-x(2)-bJ-EIL3-x-EDN3-P. 

NAME: Amyloidogenic glycoprotein extracellular domain 
signature. 

CONSENSUS : G-EVT3-E-EFY3- V-C-C-P . 

NAME: Amyloidogenic glycoprotein intracellular domain 
signature- 

CONSENSUS : G- Y-E-N-P-T- Y-EKR3 - 

NAME : Cadherins extracellular repeated domain signature • 
CONSENSUS: ELIV3-x-ELIV3-x-D-x-N-D-ENH3-x-P . 



NAME: Insect cuticle proteins signature. 

15 CONSENSUS: G-x ( 7 ) -EDEN 3-G-x ( b ) -Y-X-A-EDNG3-X (2 -,3 ) -G-EFY3-X- 

EAP3 • 

NAME : Gas vesicles protein GVPa signature 1 . 

CONSENSUS: ELIVM3-X-EDE3-ELIVMFYT3-ELIVM3-EDE3-X-ELIVM3 ( 2 ) - 

20 EDKR3(2)-G-x-ELIVM3C2> - 

NAME: Gas vesicles protein GVPa signature 2. 

CONSENSUS: R-ELIVA3 (3) -A-EGS3-ELIVMFY3-x-T-x (3>-Y-EAGJ- 

25 NAME : Gas vesicles protein GVPc repeated domain signature- 

CONSENSUS : F-L-x (2) -T-x (3) -R-x <3)-A-x (2 )-(3-x (3) -L-x C2)-F - 

NAME: Bacterial microcompart iments proteins signature- 

CONSENSUS: D-x ( □ n 1 ) -M-x-K-ES AG3 ( 2 ) -x-EI V3-X-ELI VM3-ELIVM A 3- 

30 EGCS3-x(i|)-EGD3-ESGPD3- 
CONSENSUS: EGA3 - 

NAME: Flagella basal body rod proteins signature. 

CONSENSUS: EGTARY<23-x ( T ) -ELIVMYSTA3 ( 2 ) -EGSTA J-ESTADEN3-N- 

35 ELIVM3-ESAN3-N-X-ESADNFR3- 
CONSENSUS: ESTV3 . 

NAME : Flagella transport protein fliP family signature 1- 

CONSENSUS : EPA3-A-EFY3-X-ELI VT2-ESTH3-EES J-ELI3-X (2 )-EGA3-F 

40 EKREC23-EIM3-G-ELIF3 - 

NAME: Flagella transport protein fliP family signature 2- 

CONSENSUS : P-ELIVMF3-K-ELIVMF3 ( 5> -X-ELIVMA3-EDNGS3-G-W . 

45 NAME : Plant viruses icosahedral capsid proteins r S f region 

signature. 

CONSENSUS: EF YU3-X-EPSTA3-X (7 ) -G-x-ELIVM3-x-ELIVM3-x-EFYWI3 

x(2)-D-x(5)-P- 

50 NAME: Potexviruses and carlaviruses coat protein signature 

CONSENSUS: ERK3-EF YU3-A-EGAP3-F-D-X-F-X (2 ) -ELV3-X ( 3) - 

EGAST3C2). 

NAME: Neuro transmitter-gated ion-channels signature - 

55 CONSENSUS: C-x-ELI VMF<23-x-ELIVMF3-x ( 2 ) -EF Y3-P-x-D-x (3)-C - 

NAME : ATP P2X receptors signature- 
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CONSENSUS: G-G-x-ELIVM3-G-ELIVI13-x-EIV3-x-lil-x-C-EDN3-L-D- 
x(5)-C-x-P-x-Y-x-F. 

NAME: 6-protein coupled receptors signature. 

5 CONSENSUS: EGSTALIVMFYLJC3-EGSTANCP]>E3~[EDPKRH}-x (£ > - 

ELIVI1N<2GA3-x(E>-ELIVriFT3- 

CONSENSUS: EGSTANC3-ELIVJ1FYIilSTAC3-EI>ENH3-R-EFYWCSH3-x<S>- 
ELIVM3. 

10 NAME: G-protein coupled receptors family E signature 1- 

CONSENSUS: C-x (3>-EFYULIV3-I>-x(3-. l »>-C-EFIiO-x<2>-ESTAGV3- 

x<fl-.T>-C-EPF3. 

NAME: G-protein coupled receptors family S signature E. 

15 CONSENSUS: (2-G-ELMFCA3-ELIVI1FT3-ELIV3-X-ELIVFST3-ELIF3- 
EVFYH3-C-ELFY3-x-N-x(E)-V. 

NAME: G-protein coupled receptors family 3 signature !• 

CONSENSUS: ELV3-X-N-ELIV113 (E ) -x-L-F-x-I-EPA3-<2-ELIVI13-ESTA3- 

20 x-ESTA3(3)-ESTAN3- 

NAME: G-protein coupled receptors family 3 signature E- 

CONSENSUS: C-C-EFYU3-x-C-x(E)-C-x(M)-EFYU3-x(E-i l O-El>N3-x(E)- 
ESTAH3-C-x(E)-C 
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NAME: G-protein coupled receptors family 3 signature 3- 

CONSENSUS : F-N-E-ESTA3-K-X-I-ESTAG3-F-EST3-C1,. 



NAME: Visual pigments (opsins) retinal binding site. 

30 CONSENSUS: ELIVMUAC3-EPGAC3-X (3 ) -ESAC3-K-ESTALII1R3-EGSACPNV3- 

CSTACP3-x(£)-EDENF3- 
C0NSENSUS: E AP3-X < E > -EIY3 ■ 

NAHE: Bacterial rhodopsins signature 1* 

35 CONSENSUS: R-Y-x-EDT3-U-x-ELIVI1F3-EST3-T-P-ELIVn3 < 3 > . 

NAME: Bacterial rhodopsins retinal binding site. 

CONSENSUS: EFYIV3-x-EFYVG3-ELIVI13-I>-ELIVt1F3-x-ESTA3-K-x(E) - 



EFY3- 

NAME: Receptor tyrosine kinase class II signature. 

CONSENSUS: EDN3-ELIV3-Y-X (3)-Y-Y-R . 



NAME: Receptor tyrosine kinase class III signature. 

45 CONSENSUS: G-x-H-x-N-ELIVM3-V-N-L-L-G-A-C-T • 

NAME: Receptor tyrosine kinase class V signature 1- 

CONSENSUS: F-x-EDN3-x-EGAbJ3-EGA3-C-ELIVM3-ESA3-ELIVH3(E)- 
ESA3-ELV3-EKRHI23-ELIVA3- 
50 CONSENSUS: x (3 ) -EKR3-C-EPSAU3 • 

NAME: Receptor tyrosine kinase class V signature E* 

CONSENSUS: C-x (E > -EDE3-G-EDEO-W-X (E-.3) -EPA(23-ELIVriT3-EGT3-x- 

C-x-C-x(E)-G-EHFY3- 
55 CONSENSUS: EE03- 

NAME: Growth factor and cytokines receptors family signature 

1. 
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CONSENSUS: C-ILVFYRI-x ( 7 ,a ) -ESTIVDNlt-C-x-U . 

NAME: Growth factor and cytokines receptors family signature 
2. 

5 CONSENSUS: DZSTGLJ-x-U-ISGJ-x-U-S - 

NAME: TNFR/NGFR family cysteine-rich region signature- 

CONSENSUS: C-x ( M -.fc, ) -EFYH J-x ( 5 -.10 ) -C-x ( D-, S ) -C-x ( 2 -.3) -C- 
x(7-.ll)-C-x(M-,t>)-IDNE(JSKP3- 

10 CONSENSUS: xC2)-C 

NAME: TNFR/NGFR family cysteine-rich region domain- 

NAME: Integrins alpha chain signature* 

15 CONSENSUS: CFYWSH-ICRO-x-G-F-F-x-R . 

NAME: Integrins beta chain cysteine-rich domain signature. 

CONSENSUS: C-x-EGNfll-x ( 1,3 ) -G-x-C-x-C-x ( 2 ) -C-x-C - 

20 NAME: Natriuretic peptides receptors signature- 

CONSENSUS: G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-liI. 

NAME: Photosynthetic reaction center proteins signature- 

CONSENSUS: INHJ-x ( M ) -P-x-H-x ( 2) -ESAGJ-x ( 11 ) -ESAGO-x-H- 
25 ESAG3C2). 

NAME: Antenna complexes alpha subunits signature* 

CONSENSUS: ELIVFAG3-x-EGASV3-CLIVFA3-x-EIV3-H-x (3>-ELIVMD- 
CGSTAE3-CSTANH3-x(l-,3)- 

30 CONSENSUS: ESTN3-U-ELIVMFYIO • 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: EEflH-x ( H ) -H-x ( S ) -EGSTAJ-x (3) -IFYD-x ( 3>-EAG3-x (2 ) - 

EAVJ-H-x(7)-P. 
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NAME: Photosystem I psaA and psaB proteins signature. 

CONSENSUS: C-D-G-P-G-R-G-G-T-C . 



NAME: Photosystem I psaG and psaK proteins signature- 

40 CONSENSUS: G-F-x-ELIVM3-x-EI>EA3-x (2) -EGAH-x-EGTAl-ESAD-x-G-H- 

X-ELIVM3-EGA3 . 



NAME: Phytochrome chromophore attachment site signature- 
CONSENSUS : ERGSl-EGSAI-EPVU-H-x-C-H-x ( 2) -Y - 

NAME: Phytochrome chromophore attachment site domain 
profile. 



NAME: Speract receptor repeated domain signature- 

50 CONSENSUS: G-x (£) -G-x ( 2) -E-x < b ) -U-G-x ( 2) -C-x (3 ) -EFYIO-x ( fi ) -C- 

x(3)-G- 

NAME: TonB-dependent receptor proteins signature 1- 

CONSENSUS: <x C1D -.115 ) -H>ENF3-EST3-ELIVMF3-ELIVSTEi23-V-x- 

55 CAGPJ-CSTANEdPO- 

NAME: TonB-dependent receptor proteins signature 2- 
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CONSENSUS: EL YGST ANE3-x ( 3 ) -EGST AEN(0-x-EPGEll-R-x-ELIVF YUA3-X- 

ELIVMFTAJ-ESTAGNdU- 

CONSENSUS: ELIVMF YGTA3-X-ELIVMF YUGTADGO-x-F> • 

5 NAME: Transmembrane 4 family signature- 

CONSENSUS: G-x (3 ) -DELI VMFJ-x (2 ) -EGSAJ-ELIVMF1 ( 2 ) -G-C-x-EGAJ- 

ESTAJ-x(2>-EEG:D-x(2)- 

CONSENSUS: ECli)N:B-ELIVM:D (2 ) - 

10 NAME: Bacterial chemotaxis sensory transducers signature- 

CONSENSUS: R-T-E-EE<2 J-tf-x (2 ) -ESAID-ELIVMiB-x-EEtO-T-A-A-S-M-E- 

fl-L-T-A-T-V. 

NAME: ER lumen protein retaining receptor signature 1- 

15 CONSENSUS: G-I-S-x-EKRl-x-tf-x-L-EF Y J-x-ELIVH (2)-F-x(2>-R-Y- 

NAME: ER lumen protein retaining receptor signature 2- 

CONSENSUS: L-E-ESAID-V- A-I-ELMJ-P-tf-L - 

20 NAME: Ephrins signature- 

CONSENSUS: EKR(2 J-ELF3-ECST3-x-K-EIFH-fl-x-EFY3-EST3-EPA3-x (3 ) - 

G-x-E-F-x(S)-EFY3(2)- 

CONSENSUS: x(2>-ESA3- 

25 NAME: Granulins signature- 

CONSENSUS : C-x-D-x ( 2 ) -H-C-C-P-x ( H ) -C - 
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NAME : HBGF/FGF family signature. 

CONSENSUS: G-x-L-x-ESTAGP3-x (b-.?) -EDED-C-x-EFMJ-x-E-x (fa) BY- 

NAME: PTN/MK hepar in-binding protein family signature 1- 

CONSENSUS: S-EDEJ-C-x-EDEJ-U-x-li-x ( 2 ) -C-x-P-x-ESN3-x-D-C-G- 

ELIVMA J-G-x-R-E-G • 



35 NAME: PTN/MK hepar in-binding protein family signature 2- 

CONSENSUS : C-EKR^-ELIVMH-P-C-N-U-K-K-x-F-G-A-EDEl-C-KC-Y-x-F- 
EEtO-x-ld-G-x-O 

NAME: Nerve growth factor family signature- 

40 CONSENSUS: G-C-EKRJ-G-ELI V3-EDE3-X (3)-EYU3-x-S-x-C- 

NAME: Platelet-derived growth factor (PDGF) family 
signature- 

CONSENSUS: P-EPS J-C- V-x (3 ) -R-C-EGSTA 3-G-C-C - 

45 

NAME: Small cytokines (intercrine/chemokine) C-x-C subfamily 
signature • 

CONSENSUS : C-x-C-ELIVMJ-x <S-»fe,)-ELIVMFY3-x(2)-ERKSE(23-x- 

ELIVM3-x(2)-ELIVM3-x(S)- 
50 CONSENSUS: ESAGJ-x ( 2 ) -C-x ( 3 ) -EE(33-ELIVM3 ( 2 ) -x ( ) -C-L-EDN3 - 

NAME: Small cytokines (intercrine/chemokine) C-C subfamily 

signature- 

CONSENSUS: C-C-ELIFYTJ-x (5-ifa) -ELO-x ( >4 )-ELIVMF3-x (2) -EFYU3- 

55 x(b-.fl)-C-x(3TM)-ESAG3- 

CONSENSUS: ELIVMID ( 2 ) -EFL J-x (fi ) -C-ESTAJ - 

NAME: TGF-beta family signature- 
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CONSENSUS: ELI VMIO-x ( 2 ) -P-x ( 5 ) -EFYJ-x ( M ) -C-x-G-x-C - 

NAME: TNF family signature- 

CONSENSUS: ELV3-x-ELIVM3-x(3)-G-ELIVMF3l-Y-ELIVMFYni(2)-x(2)- 
5 EGEKHLJ-ELIVMGTIB-x- 

CONSENSUS: ELIVMFY3 - 

NAME: TNF family profile. 

10 NAME: hJnt-1 family signature- 

CONSENSUS : C-K-C-H-G-ELIVNTH-S-G-x-C - 

NAME: Interferon alphai beta and delta family signature- 
CONSENSUS : EFYHil-EFYl-x-EGNRO-ELIVMJ-x ( 2 ) -EFYD-L-x (7 ) -ECY3- 

15 A-U- 

NAME : Granulocyte-macrophage colony-stimulating factor 
signature • 

CONSENSUS : C-P-ELP J-T-x-E-ESTI-x-C - 
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NAME : Interleukin-1 signature- 

CONSENSUS: EFCH-x-S-EASLV J-x (2) -P-x (2 ) -EFYLIVJ-ELI J-ESCA J-T- 

x(7)-ELIV!13l. 



25 NAME: Inter leukin-2 signature- 

CONSENSUS: T-E-ELFJ-x (2) -L-x-C-L-x (2 ) -E-L - 

NAME: Interleukins -M and -13 signature. 

CONSENSUS: L-x-E-ELIVMH (2 ) -x ( H 5 ) -ELIVMJ-ETLID-x ( 5 7 ) -C-x ( M ) - 
30 EIVA3-X-EDNS3-ELIVMA2 • 

NAME : Interleukin-b / G-CSF / NGF signature- 

CONSENSUS: C-x ( ^ ) -C-x ( t> ) -G-L-x ( 2 ) -EFYl-x ( 3 ) -L - 

35 NAME: Inter leukin-7 and -1 signature- 

CONSENSUS : N-x-ELAPJ-ESCTIB-F-L-K-x-L-L - 



NAME: Interleukin-10 family signature. 

CONSENSUS : EGS 3-C-x ( 2 ) -EL V3-x ( 2 ) -ELI VM2 ( 2 ) -x-F-Y-L-x ( 2 ) -V - 

NAME: LIF / OSM family signature- 

CONSENSUS: EPSTJ-x ( H ) -F-ENGD-x-K-x ( 3 ) -C-x-ELF J-L-x ( 2 ) -Y-EHK3 . 



NAME: Macrophage migration inhibitory factor family 

45 signature- 

CONSENSUS: EDE J-P-C-A-x ( 3 ) -ELIVMJ-x-S-I-G-x-ELIVMJ-G . 



NAME: Adipokinetic hormone family signature- 
CONSENSUS: fl-ELVD-ENTJ-EFYJ-ESTH-x ( 2 ) -U - 

NAME: Bombesin-like peptides family signature. 
CONSENSUS : W-A-x-G-ESH 3-ELFJ-M - 



NAME: Calcitonin / CGRP / IAPP family signature. 

55 CONSENSUS: C-ESAGDN3-ESTN3-X ( D-i 1 ) -ESAl-T-C-EVMAl-x ( 3 ) -ELYF3- 

x(3)-ELYF3- 

NAME : Corticotropin-releasing factor family signature- 
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CONSENSUS : IPQ I-x-CLIVMI-S-ELI VMJ-x ( 2 ) -CPSTJ-ELIVnTl-x- 

ELIVM:D-L-R-x(2>-ELIVM:B. 

NAME: Crustacean CHH/MIH/GIH neurohormones family signature. 

5 CONSENSUS: C-EDENO-D-C-x-N-ILIVJ-EFYl-R-x ( 7) -C-EKRJ-x (2 ) -C - 

NAME: Erythropoietin / thrombopoeitin signature- 

CONSENSUS: P-x C M ) -C-D-x-R-ELI VIU (2) -x-EKRH-x ( 1M ) -C - 

10 NAME: Granins signature 1- 

CONSENSUS: OEH-ESNJ-L-ESANJ-x (E)-EDEH-x-E-L - 

NAME: Granins signature 2. 

CONSENSUS : C-ELIVMJ (2 ) -E-ELIVPO (2) -S-EDNJ-ESTAU-L-x-K-x-S- 

15 x(3)-ELIV(13-ESTAD-x-E-C. 

NAME: Galanin signature- 

CONSENSUS: G-ld-T-L-N-S- A-G- Y-L-L-G-P-H . 

20 NAME : Gastrin / cholecystokinin family signature. 

CONSENSUS: Y-x CO-.D-EGDJ-ELIHJ-n'-CDRl-F - 

NAME: Glucagon / GIP / secretin / VIP family signature- 

CONSENSUS: lYHJ-ESTAIVGDH-EDEO-EAGFU-ELIVMSTE J-EFYLRJ-x- 

25 CDENSTAK3-EDENSTAJ- 

CONSENSUS: ELIVMFYGJ-x ( -EKREflL3-EKRDEN(JLll-ELVFYIi)G3-ELIV03 - 
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NAME: Glycoprotein hormones alpha chain signature 1- 

CONSENSUS: ' C-x-G-C-C-EFYJ-S-R-A-IF Y3-P-T-P - 

NAME: Glycoprotein hormones alpha chain signature 2- 

CONSENSUS: N-H-T-x-C-x-C-x-T-C-x (2 ) -H-K - 



NAME: Glycoprotein hormones beta chain signature 1- 
35 CONSENSUS: - C-ESTAGMJ-G-EHFYLH-C-x-ESTJ - 

NAME: Glycoprotein hormones beta chain signature 2- 

CONSENSUS: EPAJ-V-A-x (2) -C-x-C-x C2)-C-x (M > -EST]>:D-^>EY3-C- 

x Cfc.-. fl ) -EPGSTAVMJ-x ( 2) -C - 



NAME: Gonadotropin-releasing hormones signature. 

CONSENSUS: (2-H-EFYLI3-S-X ( ^ ) -P-G - 



NAME: Insulin family signature. 
45 CONSENSUS : C-C--CP>-x ( 2) -C-ESTDNEKPIH-x ( 3) -ELIVMFS3-X (3 > -C . 

NAME: Natriuretic peptides signature- 
CONSENSUS: C-F-G-x ( 3) -D-R-I-x (3) -S-x (2 ) -G-C - 

50 NAME: Neurohypophysial hormones signature. 
CONSENSUS: C-ELIFY3 <2>-x-N-ECS3-P-x-G - 



NAME: Neuromedin U signature- 

CONSENSUS : F-ELIVMFJ-F-R-P-R-N - 

NAME: Endogenous opioids neuropeptides precursors signature- 

CONSENSUS: C-x (3 ) -C-x ( 2 ) -C-x (2 )-EKRH3-x (b •»? ) -ELIFD-ONJ-x ( 3 ) - 

C-x-ELIVIH-EEcO-C- 
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CONSENSUS: EEcO-x ( fi ) -W-x ( 2 ) -C - 

NAMES Pancreatic hormone family signature- 
CONSENSUS: (LF YJ-x ( 3 ) -ITLIVrO-x ( S ) -Y-x ( 3 ) -ELI VMFY3-x-R-x-R- 

5 EYF3 • 

NAME: Parathyroid hormone family signature- 
CONSENSUS : V-S-E-x-G2-x(2)-H-x(E)-G. 

10 NAME: Pyrokinins signature- 

CONSENSUS : F-IEGSTVID-P-R-L-IIIOJ - 

NAME: Somatotropin-* prolactin and related hormones signature 
1. 

15 CONSENSUS: C-x-ESTJ-x ( E ) -ELIVMFY J-x-ELIVMST AD-P-x ( 5) -ETALIVU- 
x(7)-ICLIVf1FY3-x(b)- 

CONSENSUS: ELIVMFYH-x ( E ) -ESTAH-liI - 

NAflE: Somatotropin! prolactin and related hormones signature 
20 2. 

CONSENSUS : C-ELIVMFYID-x (2 ) -D-CLIVMFYSTAH-x < 5 ) -ELIVMFYJ-x (2) - 
CLIVMFYTJ-x(E)-C. 

NAME: Tachykinin family signature- 

25 CONSENSUS: F-CIVFYJ-G-ELMJ-M-lUG^ - 

NAME: Thymosin beta-M family signature. 

CONSENSUS : K-L-K-K-T-E-T-t3-E-K-N - 

30 NAME : Urotensin II signature- 

CONSENSUS: C-F-ld-K- Y-C - " 
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NAME: Cecropin family signature. 

CONSENSUS: W-x(0i2)-CKPN3-x ( 2) -K-CKRE3-CLIID-E--ILRKN3 • 

NAME: Mammalian defensins signature. 
CONSENSUS: c-x-C-x ( 3 -» 5 ) -C-x ( 7) -G-x-C-x (1 ) -C-C - 



NAME: Arthropod defensins signature- 
40 CONSENSUS: C-x ( E 3 ) -EHNJ-C-x (3nM ) -EGRl-x ( 2 ) -G-G-x-C-x ( H 7) -C- 

x-O 

NAME : Cathelicidins signature 1- 

CONSENSUS: Y-x-EED 2-x- V-x-ER<3J-A-ELIVMA3-EDt3G J-x-ELIVMFY J-N- 

45 EE<3J. 

NAME: Cathelicidins signature E- 

CONSENSUS: F-x-ELIVM J-K-E-T-x-C-x (ID ) -C-x-F-EKRJ-EKE 1 - 

50 NAME: Endothelin family signature- 

CONCENSUS: C-x-C-x < H ) -D-x ( 2 ) -C-x < 2 ) -EFYJ-C . 



NAME: Plant thionins signature- 

CONSENSUS: C-C-x ( 5) -R-x (2 ) -EFYJ-x (2) -C - 

NAME: Gamma-thionins family signature- 

CONSENSUS : EKRJ-x-C-x ( 3 ) -CSV J-x ( 2 ) -EFYUH J-x-EGFll-x-C-x ( 5 ) -C- 

x(3)-C- 
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NAME : Snake toxins signature. 

CONSENSUS: G-C-x (1 -.3) -C-P-x (fi -,!□) -C-C-x (2) - 

5 NAME: Myotoxins signature- 

CONSENSUS: K-x-C-H-x-K-x ( 2 ) -H-C-x ( 2 ) -K-x ( 3 ) -C-x C fi) -K-x ( 2 > -C- 

x(2)-ERK:i]-x-K-C-C-K-K. 

NAME : Scorpion short toxins signature- 

10 CONSENSUS: C-x (3 ) -C-x ( h -» T ) -EGASJ-K-C-EIMflTID-x ( 3 ) -C-x-C - 

NAME: Heat-stable enterotoxins signature- 

C0NSENSUS: C-C-x < 2 ) -C-C-x-P-A-C-x-G-C - 

15 NAME: Aerolysin type toxins signature- 

CONSENSUS: CKT J-x (2 ) -N-td-x C 2 ) -T-EDND-T - 

NAME: Shiga/ricin ribosomal inactivating toxins active site 

signature. 

20 CONSENSUS: OTLIVMAIB-x-niLIVMSTAID (2 ) -X-E-IESAGV2-CSTAL3-R-CF Y31- 

ICRKN(3S3-x-0:LIVn3-IEEr3S3- 
C0NSENSUS: ... x (2 ) -ELIVMFJ - 

NAME: Channel forming colicins signature- 

25 CONSENSUS: T-x (2 ) -W-x-P-ELIVMFY3(3) -x (2) -E • 

NAME: Hok/gef family cell toxic proteins signature- 

CONSENSUS : CLIVMA3 ( M ) -C-ELI VMFAJ-T-ELIVMA J ( 2 ) -x ( M ) -ELIVrO-x- 

CRG3-x(2)-L-CCY31. 

30 

NAME: Staphylococcal enterotoxin /Streptococcal pyrogenic 

exotoxin signature 1- 

CONSENSUS: Y-G-G-ELIV J-T-x ( M ) -N - 

35 NAME: Staphy loccocal enterotoxin/Streptococcal pyrogenic 

exotoxin signature 2- 

CONSENSUS: K-x ( 2 > -ELIVID-x ( M ) -ELIVJ-D-x ( 3> -R-x ( 2 ) -L-x ( 5) - 

CLIVni-Y- 

40 NAME: Thiol-act i vated cytolysins signature- 

CONSENSUS: CRK3-E-C-T-G-L-X-U-E-U-U-IERK3 - 

NAME: Membrane attack complex components / perforin 

signature. 

45 CONSENSUS: Y-x (b) -CFYJ-G-T-H-CFYJ - 

NAME: Pancreatic trypsin inhibitor CKunitz) family 

signature- 

CONSENSUS: F-x(3)-G-C-x(b)-I[FY31-x(S)-C. 
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NAME: Bowman-Birk serine protease inhibitors family 

signature- 

CONSENSUS: C-x ( 5-. h ) -IDENflKRHSTA J-C-EPASTDHJ-IEPASTDKJ-Ii: ASTDV3- 

C-ENDKSID-OEKRHSTAJ-C. 

NAME: Kazal serine protease inhibitors family signature- 

CONSENSUS : C-x C 7 ) -C-x C b ) -Y-x (3 ) -C-x ( 2 3) -C . 
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NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors 

family signature- 

CONSENSUS: ELI VM3-x-D-x-EEDNTY3-EDG3-ERKHDEN(23-x-ELI VIU-x ( 5 ) - 

Y-x-ELIVMl- 

NAME: Serpins signature. 

CONSENSUS: ELIVMFY 3-x-ELIVMF YAC3-EDN<23-ERKHflS3-EPST3-F- 

ELIVMFY3-ELIVMFYC3-X- 

CONSENSUS: ELIVMFAH3. 

NAME: Potato inhibitor I family signature- 

CONSENSUS: EFYIO-P-EE<2H3-ELI V3 ( 2) -G-x (2) -ESTAGV 3-x ( 2 ) -A - 



NAME: Squash family of serine protease inhibitors signature- 

15 CONSENSUS: C-P-x (5 ) -C-x (2 >-D-x-D-C-x (3) -C-x-C - 

NAME : Streptomyces subtilisin-type inhibitors signature- 

CONSENSUS: C-x-P-x (2i3) -G-x-H-P-x ) -A-C-EATD3-X-L - 

20 NAME: Cysteine proteases inhibitors signature- 

CONSENSUS: EGSTE(2KRV3-t2-ELIVT3-EVAF3-ESAGi23-G-x-ELIVMNI< 3- 

x<2)-ELIVMFYJ-x-ELIVMFYA3- 

CONSENSUS: EDEN<2KRHSIV3 - 

25 NAME: Tissue inhibitors of metalloproteinases signature- 
CONSENSUS: C-x-C-x-P-x-H-P-<2-x-A-F-C - 

NAME: Cereal trypsin/alpha-amy lase inhibitors family 
signature. 

30 CONSENSUS: c-x ( 4 ) -ESAGD 3-x ( M ) -ESPAL 3-ELF3-X ( 2 ) -C-ERH3-X- 

ELIVnFY3(2)-xC3-.M>-C- 

NAME: Alpha-2-macroglobulin family thiolester region 
signature. 

35 CONSENSUS: EPG3-x-EGS3-C-EGA3-E-EE(33-x-ELIVM3 - 

NAME: Disintegrins signature* 

CONCENSUS'- C-x(2)-G-x-C-C-x-EN<2RS3-C-x-EFM3-x(b)-C-ERK3- 

40 NAME: Lambdoid phages regulatory protein CIII signature- 

C0NSENSUS: E-S-x-L-x-R-x ( 2 ) -EKR3-X-L-X ( 4 ) -EKR3 ( 2 ) -x ( 2 ) -EDE 3- 

x-L- 

NAME: Chaperonins cpntD signature- 

45 CONSENSUS: A-EAS3-x-EDE<23-E-x ) -G-G-EGA3 - 

NAME: Chaperonins cpnlD signature- 

CONSENSUS : ELIVMFY3-x-P-EILT3-x-EDEN3-EKR3-ELIVMFA3<3)- 
EKRE<33-x(fl-. K n-ESG3-x- 
50 CONSENSUS: ELIVMFY3C3)- 

NAME : Chaperonins TCP-1 signature 1- 

C0NSENSUS: ERKEL3-EST3-x-ELMFY3-G-P-x-EGSA J-x-x-K-ELIVMF 3 (2 ) . 

55 NAME: Chaperonins TCP-1 signature 2- 

CONSENSUS s ELIVM3-ETS3-ENK3-I>-EGA3-EAVNHO-ETAV3-ELIVri3(2>- 

x(2)-ELIVM3-x-ELIVM3-x- 

C0NSENSUS: ESNH3-EPGH3 - 
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NAME: Chaperonins TCP-1 signature 3- 

CONSENSUS: c2-EDEK3-x-x-ELI VMGT A3-EG A3-D-G-T . 

5 NAME • Heat shock hspBO proteins family profile- 

NAME : Heat shock hsp?D proteins family signature 1- 

CONSENSUS: EIV3-D-L-G-T-EST 3-X-ESC3 . 

10 NAME: Heat shock hsp?D proteins family signature 2- 

CONSENSUS: ELI VHP 3-ELI VMFY 3-EDN 3-ELI VMFS 3-G-EGSH J-EGS3-EASTJ 

x(3)-EST3-ELIVM3- 

CONSENSUS: ELIVMFC3- 

15 NAME: Heat shock hsp7D proteins family signature 3- 

CONSENSUS : ELIVMY3-x-ELIVMF3-x-G-G-x-EST3-x-ELIVM3-P-x- 
ELIVM3-x-OE(2KRSTA3. 

NAME: Heat shock hspTD proteins family signature. 

20 CONSENSUS: Y-x-ENt2H3-K-EDE3-EIVA3-F-L-R-EED 3 . 

NAME: Chaperonins clpA/B signature 1- 

CONSENSUS: D-E AI J-ESGA3-N-ELIVMF3 ( 2 ) -K-EPT3-x-L-x ( 2 ) -G - 

25 NAME: Chaperonins clpA/B signature 2. 

CONSENSUS: R-ELIVMFY3-D-X-S-E-ELIVMFY 3-x-E-EKR(33-x-ESTA3-x- 

ESTA3-EKR3-ELIVM3-X-G- 

CONSENSUS: ESTA3 • 

30 NAME: Nt-dnaJ domain signature. 

CONSENSUS: EFY3-X ( 2 ) -ELIVMA3-X ( 3 ) -EF YWHNT3-EJ>EN(2SA3-x-L-x- 

EDN3-xC3)-EKR3-x(2)-EFYI3. 



35 



NAME: dnaJ domain profile- 

NAME: CXXCXGXG dnaJ domain signature- 

CONSENSUS: C-EDEGSTHKR3-x-C-x-G-x~EGK3-EAGSDM3-x (2>-EGSNKR3- 

xCt*-,b)-C-x(2-.3)-C-x-G-x-G- 

40 NAME : grpE protein signature- 

CONSENSUS : EFL3-EDN3-EPHEA3-X ( 2 ) -EHM3-X-A-ELI VMTNJ-x ( It -»2D ) - 

G-EFY3-x(3)-EDEG3-xC2)- 

C0NSENSUS: ELIVM3-ERI3-x-ESA3-x-V-x-EIV3 - 

45 NAME : Bacterial type II secretion system protein C 

signature- 

CONSENSUS: P-x(t)-F-x( l 4)-L-x(3) -D-ELIVM3- A-ELIVM3-X-ELIVM3-N 

x-ELIVMJ-x-L- 

50 NAME : Bacterial type II secretion system protein D 

signature. 

CONSENSUS: EGR3-EDE(2KG3-ESTVM3-ELIVMA3(3)-EGA3-G-ELIVMFY3- 
xClD-ELIVMJ-P- 

CONSENSUS: ELIVMF YUGS3-ELIVMF3-EGSAE3-X-ELIVM3-P- 

55 ELIVMFYW3<2)-x(2)-ELV3-F- 

NAME: Bacterial type II secretion system protein E 

signature- 
's- 
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CONSENSUS: ELIVMID-R-x (E ) -P-D-x-ELIVMJ ( 3 ) -G-E-ELIVMJ-R-D . 

NAME ' Bacterial type II secretion system protein F 
signature - 

5 CONSENSUS: EKRtfJ-ELIVMAID-x (2 ) -CSAIV3-CLIVri3-x-CTY3-P-x ( S ) - 

ELIVM2-x(3>-ESTAGV:B-xCfc,)- 
CONSENSUS: ELMY3-X ( 3 ) -ELIVMFID ( 2 ) -P - 

NAME: Bacterial type II secretion system protein N 
10 signature- 

CONSENSUS: G-T-L-U-x-G-x ( 11 ) -L-x ( H ) -U . 

NAME : Bacterial export FHIPEP family signature- 

CONSENSUS : R-ELIVMJ-EGSAJ-E-V-EGSAD-A-R-F-EST V I-L-D-EGSAU-M- 

15 P-G-K-tf-M-EGSAJ-I-D- 

CONSENSUS: [6SA1-D-- 



NAME: Protein secA signatures- 

CONSENSUS: EIVD-x-OVID-ESAJ-T-ENfl J-M-A-G-R-G-x-D-I-x-L - 

NAME: Protein secY signature 1- 

CONSENSUS: EGST3-ELIVMFJ(2) -x-ELIVMID-G-ILLIVMJ-x-P- 

ELIVMFYJO>-x-EAS:D-EGST(23- 

CONSENSUS - ELIVnFATJ(3)-(3-ELIVriFA3(2) - 

NAME: Protein secY signature 2- 

CONEENSU^: ELIVMFYtO ( 2 > -x-OEJ-x-ELIVIIFJ-ESTNJ-x (2 ) -G- 

ELIVMFJ-EGSTJ-ENSTH-G-x-ICGSTl- 
CONSENSUS: ELIVMF3(3) . 

NAME : Protein secE/secbl-gamma signature- 

CONSENSUS: ELIVMFY J-x ( 2 ) -EDENI2GA3-X C l*) -ELIVMT AH-x-EKRVJ-x ( 2 ) - 

EKUI-P-x(3)-ESE(31!-x(7)- 

CONSENSUS: ELIVT3-ELIVGA3-ELIVFGAST3 . 

NAME: Gram-negative pili assembly chaperone signature- 

CONSENSUS: ELIVMFY J-EAPN J-x-ONSJ-EKRErO-E-ESTRi-ELIVMARJ-x- 

EFYUT3-x-ENC3-ELIVr!3- 
CONSENSUS: x ( 2 ) -ELI VMJ-P-EPASJ - 

NAME: Fimbrial biogenesis outer membrane usher protein 

signature. 

CONSENSUS: EVLJ-EPASflH-EPASl-G-EPADU-EF YH-x-ELID-IEDNraSTAP 2- 

ONhO-ELIVMFYID - 

NAME: SRPSM-type proteins GTP-binding domain signature- 

CONSENSUS: P-ELIVMJ-x-EFYL J-ELIVMAT J-EGSID-x-EGSJ-EEtO-x ( * ) - 

ELIVMFJ. 

50 NAME: Cytochrome c oxidase assembly factor COXID/ctaB/cyoE 

signature. 

CONSENSUS: EED3-x-D-x (2 ) -M-x-R-T-x < 2 ) -R-x ( M ) -G - 

NAME: Cyclin-dependent kinases regulatory subunits signature 

55 1- ~ 

CONSENSUS: Y-S-x-EKRJ-Y-x-EDE 1 ( S) -x-EFYJ-E-Y-R-H-V-x-ELVH- 

EPT1-CKRPH. 
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NAME: Cyclin-dependent kinases regulatory subunits signature 

B • 

CONSENSUS: H-x-P-E-x-H-EIVJ-L-L-F-EKR 1 . 

5 NAME: Pentaxin family signature- 

CONSENSUS: H-x-C-x-ESTJ-U-x-ESTID - 

NAME: Immunoglobulins and major histocompatibility complex 

proteins signature • 
10 CONSENSUS : EFYU-x-C-x-EVAD-x-H - 

NAME: Prion protein signature 1- 

CONSENSUS: A-G- A- A- A-A-G-A- V-V-G-G-L-G-G- Y . 

15 NAME: Prion protein signature 5- 

CONSENSUS: E-x-EEDD-x-K-ELIVMJ ( B) -x-EKRD-ELIVMJ ( B) -x-EtfEJ-M- 

C-x(B)-<3-Y- 

NAME: Cyclins signature. 

20 CONSENSUS: R-x ( B ) -ELIVMSAH-x ( B ) -EFYUSJ-ELIVMl-x ( a ) -ELIVMFO- 

xCM)-ELIVMFYA:i-x(B)- 

CONSENSUS: ESTAGC3-ELIVMFY(3 J-x-ELIVHFYCl-ELIVIIFYll-D-IERKHD- 

ELIVMFYliO- 

25 NAME: Proliferating cell nuclear antigen signature 1- 

CONSENSUS: EGAl-ELIVMFJ-x-ELIVMAJ-x-ESAVJ-ELIVMJ-D-x^ENSAEJ- 
EHKRJ-EVIH-x-EL Y J- 

CONSENSUS: EVGAH-x-ELIVM J-x-ELIVM J-x ( M) -F - 

30 NAME : Proliferating cell nuclear antigen signature B. 

CONSENSUS: ERKA3-C-EDE3-ERH3-X <3)-ELIVMF3-x(3)-ELIVM])-x- 

ESGANH-ELIVMFH-x-K- 

CONSENSUS: ELIVMFJ (B) . 

35 NAME: Actin-depolymerizing proteins signature. 

CONSENSUS : P-EDE3-x-ESA3-x-ELIVMT3-ElCR3-x-EKR3-M-ELIVril-EYA3- 

ESTA3(3)-x(3)-ELIVnFl- 

CONSENSUS: EKRJ - 

40 NAME: BCLB-like apoptosis inhibitors (spans part of BH3n BH1 
and BHB) . 

NAME : Apoptosis regulator-. Bcl-B family BH1 domain 
signature. 

45 CONSENSUS: ELVMEJ-EFTJ-x-EGSDJ-EGLJ-x CL-.B)-ENS3-EYlin-G-R- 

ELIV3-ELIVC3-EGAT3- 

CONSENSUS : ELIVMFJ (2 ) -x-F-EGSAEJ-EGSARYIfl - 

NAME: Apoptosis regulator! Bcl-B family BHB domain 

50 signature- 

CONSENSUS: U-ELIMJ-x (3 > -EGRJ-G-EUiO-EDENSAVll-x-EFLGAli- 

ELIVFTCJ - 

NAME : Apoptosis regulator! Bcl-B family BH3 domain 

55 signature. 

CONSENSUS: ELIVATJ-x ( 3 ) -L-EKARO-x-EIVALID-G-D-EDESGJ-ELIMFVl- 

EDENSHfiU-ELVSHRtfO- 

C0NSENSUS: ENSRJ - 
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NAME: Apoptosis regulator^ Bcl-2 family BHM domain 

signature - 

CONSENSUS: EDSU-ENTJ-R-EAEJ-ELO- V-x-EKDiD-EFYl-ELIVill-EGHS J-Y- 

5 K-L-ESRJ-fi-ERO-G- 

CONSENSUS: EHYJ-x-ECliO - 

NAME: Apoptosis regulator! Bcl-2 family BHM domain profile- 

10 NAME: Arrestins signature- 

CONSENSUS : EFYJ-R-Y-G-x-E]>E!]l(2)-x-El>EID-ELIVMID(2) -G-ELIVM J-x- 

F-x-ERO-OEdJ-ELIVMJ- 

NAME: AAA-protein family signature- 

15 CONSENSUS: ELIVMTJ-x-ELIVMTl-ELIVMFJ-x-EGATMCID-EST ID-ENS3- 

xCM)-ELIVM:D-I>-x-A-ELIFA3)- 
CONSENSUS : x-R- 

NAME: Ubiquitin domain signature- 

20 CONSENSUS: K-x(2) -ELIVMJ-x-EDESAO-x (3) -ELIVMJ-EPAJ-x ( 3 > -<3-x- 

ELIVMJ-ELIVMCJ- 

CONSENSUS: ELIVMFYID-x-G-x ( M ) -OEU - 

NAME: Ubiquitin domain profile. 

25 

NAME: ADP-ribosylation factors family signature. 

CONSENSUS: EHRC2T3-X-EF YUI J-x-ELIVM J-x ( l 4)-A-x(2)~G-x(2)- 

ELIVM3-xC2)-EGSAID-ELIVMF:])-x- 
CONSENSUS: EUO-ELIVM3 - 

30 

NAME: GTP-binding nuclear protein ran signature- 

CONSENSUS: D-T- A-G-d-E-K-ELFID-G-G-L-R-EDE J-G-Y-Y - 

NAME: SARI family signature- 

35 CONSENSUS: R-x-ELIVMH-E-V-F-M-C-S-ELIVMJ (2 ) -x-EKR(0-x-G-Y-x- 

E-EAG3-EFI3-X-U1-ELIVM J- 
CONSENSUS: x-(2-Y. 

NAME: Band 7 protein family signature- 

40 CONSENSUS: R-x ( 2 ) -ELIVJ-ESANJ-x ( b ) -ELIV J-D-x (2 > -T-x < 5 > -U-G- 

ELIVJ-EKRHJ-ELINO-x- 

CONSENSUS: EKR3-ELIV J-E-ELI VH-EKR3) - 

NAME: Trp-Asp (UD> repeats signature. 

45 CONSENSUS: ELI VM ST AO-ELI VMFYWST AGO -ELIMSTAG11-ELIV M ST AGO- 

x(2)-EDN3-x(2)- 

CONSENSUS: ELIVMUSTAO-x-ELIVMFSTAGl-UI-EDENl-ELIVMFSTAGCNll . 

NAME: G-protein gamma subunit profile- 

50 

NAME: Ras GTPase-acti vating proteins signature- 

CONSENSUS: EGSNH-x-ELIVMF J-EFYJ-ELIVMFY3-R-ELI VMFY J ( 2 ) - 

EGACNH-P-EAV3-ELIV3(2)- 
CONSENSUS: ESGANJ-P- 

55 

NAME: Ras GTPase-acti vating proteins profile- 
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NAME: Guanine-nucleotide dissociation stimulators CDC24 

family signature- 

CONSENSUS: L-x ( S ) -ELIVI1F YUJ-L-x ( 2 ) -P-ELIVM-x ( 2 ) -ELIVMJ-x- 

EKRSID-x (2)-L-x-ELIVrO-x- 
5 CONSENSUS: OEO-ELIVrO-x ( 3 ) -EST! - 

NAME: Guanine-nucleotide dissociation stimulators CDCES 

family signature. 

CONSENSUS : EGAPl-ECTID-V-P-EFYll-x ( 4 ) -ELIVMF Y J-x-EDN J-ELIVMID . 

10 

NAME: MARCKS family signature 1- 

CONSENSUS: G-(2-E-N-G-H- V-EKRID - 

NAME: IIARCKS family phosphorylation site domain- 

15 CONSENSUS - E-T-P-K ( 5) -x ( D 1 ) -F-S-F-K-K-x-F-K-L-S-G-x-S-F-K- 

EKRD-ENSl-EKRll-K-E. 

NAME: Stathmin family signature 1- 

CONSENSUS: P-EKflJ-EKRH (2 ) -EDEJ-x-S-L-EEGJ-E . 

20 

NAME: Stathmin family signature 2- 

CONSENSUS: A-E-K-R-E-H-E-EKRJ-E- V - 

NAME: GTP-binding elongation factors signature. 
25 CONSENSUS: D-EKRSTGANflFYlO-x (3) -E-EKRAfl J-x-ERKtfDJ-EGCD- 

CIVriKll-ESTl-ILIV3-x(2)- 
CONSENSUS: EGSTACKRNfiJ - 

NAME: Elongation factor 1 beta/beta r /del ta chain signature 
30 1. 

CONSENSUS : EDE3-EDEG J-EDEID ( 2 ) -ELIVMFID-D-L-F-G . 

NAME: Elongation factor 1 beta/beta 1 /del ta chain signature 
2- 

35 -CONSENSUS: V-(2--S-x-D-ELIVri3-x-A-EFUn3-ENta3-K-ELIVH3 . 

NAME: Elongation factor 1 gamma chain profile. 

NAME: Elongation factor Ts signature 1- 

40 CONSENSUS: L-R-x (2) -T-EGDtO-x-EGSH-ELIVIIFJ-x (0 1) -EDENKAO-x 

K-EKRNEfiSH-EAVJ-L- 

NAME : Elongation factor Ts signature 2- 

CONSENSUS: E-ELIVMID-N-ESCVl-EdEIB-T-D-F-V-ESA J-EKRNJ . 

45 

NAME: Elongation factor P signature- 

CONSENSUS : K-x-A-x ( 4 ) -G-x ( 2 ) -ELIVJ-x-V-P-x ( 2) -ELIV3-X ( 2 ) -G - 

NAI1E: Eukaryotic initiation factor 1A signature. 

50 CONSENSUS: EinJ-x-G-x-EGSJ-EKRHID-x ( 4 ) -ECLD-x-D-G-x (2) -R-x ( 2 ) 

ERHJ-I-x-G- 

NAflE : Eukaryotic initiation factor 4E signature. 

CONSENSUS: EDEJ-EIFYJ-x(2)-F-EKR3-x<2)-ELIVM3-x-P-x-U-E-EDV3 
55 x(5)-G-G-EKRID-li). 

NAME: Eukaryotic initiation factor 5A hypusine signature- 

CONSENSUS: EPTJ-G-K-H-G-x- A-K - 
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NAflE: Initiation factor 2 signature- 

CONSENSUS: G-x-ELI VM3-X ( 5 ) -L-EKR3-EKRHNS3-x-K-x ( 5 ) -ELI VM3- 

x(2)-G-x-EDEN3-C-G- 

5 

NAME: Initiation factor 3 signature- 

CONSENSUS: EKR3-ELIVM3 ( 2 > -EDN3-EFY3-EGSN3-EKR 3-ELIVMF YS3-X- 

EFY3-EDEt2T3-xC2)-EKR3. 

10 NAME: Translation initiation factor SUI1 signature. 

CONSENSUS: ELIVn3-EEO-ELIVM3-<3-G-EI>EN3-EKH<23-EKRV3 - 

NAME : Prokaryotic-type class I peptide chain release factors 
signature • 

15 CONSENSUS: EAR3-ESTA3-x-G-x-G-G-(3-EHNGCS3-V-N-x < 3 ) -EST J- A- 

EIV3- 



20 



NAME: Transcription termination factor nusG signature- 

CONSENSUS: ELIVM3-F-G-EKRIO-X-T-P-EIV3-X-ELIVM3 . 

NAME: Calponin family repeat- 

CONSENSUS: ELIVn3-x-ELS3-S-EMAS3-G-ESTY3-ENT3-EKR<23-x (2 ) - 

ESTN3-(2-x-G-x(3i4)-G- 

25 NAME: CAP protein signature 1- 

CONSENSUS : ELIVM 3 ( 2 ) -X-R-L-EDE3-X ( M ) -R-L-E - 



30 



40 
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NAME: CAP protein signature 2. 

CONSENSUS: D-ELIVMFY3-x-E-x-EPA3-x-P-E-(3-ELI VMF Y3-K . 

NAME: Calreticulin family signature 1- 

CONSENSUS: EKRHN3-x-EDE<3N3-EDE(2NK3-x ( 3 ) -C-G-G-E AG3-EFY3- 

ELIVMJ-EKN3-£LIVMFYJ(2> - 



-35-- NAME:. - Calreticulin family signature 2. 
CONSENSUS: ELIVM3 (2)-F-G-P-D-x-C-EAG3- 



NAME: Calreticulin family repeated motif signature- 
CONSENSUS : EIV3-x-D-x-EDENST3-x(2)-K-P-EI>EH3-I>-liJ-EI>EN3- 

NAME: Calsequestrin signature 1- 

CONSENSUS : EEO-EDEJ-G-L-EDN J-F-P-x- Y-D-G-x-D-R- V - 



NAME: Calsequestrin signature 2- 
45 CONSENSUS: EDE3-L-E-D-U-ELI VM3-E-D-V-L-X-G-X-ELIVM3-N-T-E-D- 

D-D- 

NAME: S-lOD/ICaBP type calcium binding protein signature* 
CONSENSUS : ELIVMFYtO ( 2 ) -x ( 2 ) -ELK3-D-X (3 ) -EDN3-X C 3 ) -EDNSG3- 

50 EFY3-x-EES3-EFYVC3-x<2)- 

C0NSENSUS: ELI VMFS 3-ELIVMF 3 • 



NAME: Hemolysin-type calcium-binding region signature* 

C0NSENSUS: D-x-ELI3-x ( 4 ) -G-x-D-x-ELI3-x-G-G-x ( 3 ) -D - 

NAME: HlyD family secretion proteins signature. 

CONSENSUS : ELIVM3~x(2)-G-ELM3-x(3)-ESTGAV3-x-ELIVMT3-x- 
ELIVMT3-EGE3-x-EKR3-x- 
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CONSENSUS: ELIVMFYtO < 2 > -x-ELIVMFYtiO (3) - 

NAME: P-II protein urydylation site- 
CONSENSUS: Y-EKRJ-G-ICASD-EAEll-Y - 

5 

NAME: P-II protein C-terminal region signature- 
CONSENSUS : CST3-X ( 3 > -G-EDYJ-G-EKRJ-ILIVl-EFliO-inLIVMJ-x < 2 ) - 

ELIVMI. 

10 NAME: 1M-3-3 proteins signature 1- 

CONSENSUS : R-N-L-CLIV3-S-CVG3-CGA3-Y-EKN3-N-CIVA3 • 

NAME: m-3-3 proteins signature 2- 

CONSENSUS: Y-K-EDEl-S-T-L-I-EIMID-fl-L-ELFJ-ERHCl-D-N-ELFl-T- 
15 ELS3-U-ITAN3-ESAD3. 

NAME : ATP161 / PLM / HATS family signature- 

CONSENSUS: EDNSI-x-F-x-Y-D-x ( 2) - ESTJ-ELIVMl-ERfllD-x ( 2 )-G - 

20 NAME: BTG1 family signature 1. 

CONSENSUS: Y-x(2)-EHP3-U-EFY3-EAP3-E-x-P-x-K-G-x-EGA3-EFY3-R- 
C-EIVI-ERHI-ErO. 

NAME : BTG1 family signature 2. 

25 CONSENSUS: ELV3-P-X-EDE J-ELM-EST3-ELIVM3-U-EIV3-D-P-X-E-V- 

ESC3-x-ER(31-x-G-E- 

NAME: Cullin family signature- 

CONSENSUS: ELIVI-K-x < 2 > -ILIVJ-x C 2 > -L-I-EDEfll-EKRHNCl-x-Y- 

30 ELIVM3-x-R-xCt.-.?)-EFYl-x- , 
CONSENSUS : Y-x-ESA3>. ' 

NAME: Cullin family profile. 

35 NAME: Enhancer. of . rudimentary .signature • 

CONSENSUS: Y-D-I-ESAl-x-L-EFYJ-x-F-ICIVl-D-x ( 3 ) -D-ELIVJ-S - 



40 



50 



55 



NAME: G1D protein signature 1- 

CONSENSUS: L-C-C-x-EKR J-C-x ( -OEJ-x-N-x C -C-x-C-R-V-P . 

NAME: G10 protein signature 2- 
CONSENSUS: C-x-H-C-G-C-EKRHJ-G-C-ESAJ - 



NAME: Glucokinase regulatory protein family signature- 

45 CONSENSUS: G-EPA J-E-x-ELIVJ-ESTAJ-G-S-ESTZD-R-ELIVMI-K- 

ESTGA3K3>-xC2)-K- 



NAME: GTP1/0BG family signature- 

CONSENSUS: D-ELIVMJ-P-G-ELIVM J ( 2 ) -OEYZD-EGNJ-A-x (2 ) -G-x-G - 

NAME: HIT family signature- 

CONSENSUS: ENl2A3-x ( M ) -EGAV3-x-E(2F3-x-ELIVM3-x-H-ELIVMFYT3-H- 

ELIVMFT3-H-CLIVMF3(2)- 

CONSENSUS: CPSGA3 • 

NAME: Caseins alpha/beta signature. 
CONSENSUS: C-L-ELV3-A-X-A-ELVF3-A . 
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NAME: Clathrin adaptor complexes medium chain signature 1- 
CONSENSUS: EI VT3-EGSP3-hJ-R-x (2 3 ) -EGAD3-X ( 2 ) -EHYl-x (2 ) -N-x- 

ELIVMAFY3C3)-D-ELIVM3- 
CONSENSUS: ELIVMT3-E- 

5 

NAME: Clathrin adaptor complexes medium chain signature 2- 
CONSENSUS: ELIV3-x-F-I-P~P-x-G-x-ELIVMFY3-x-L-x ( 2 ) - Y . 

NAME: Clathrin adaptor complexes small chain signature- 

10 CONSENSUS: ELIVM3 ( 2 ) - Y-EKR3-X ( M ) -L-Y-F - 

NAME: Ependymins signature 1. 

CONSENSUS : F-E-E-G-X-ELIVMF3-Y-EED3-I-D-X ( 2 ) -N-E(2E 3-S-C- 

ERKH3(2>- 

15 

NAME: Ependymins signature 2. 

CONSENSUS : E(2E3-ELIVMA3-F-x ( 2 ) -P-ESTA3-EFY3-C-EDE3-EGA3- 

ELIVM3-x(2>-EDE3<2) . 

20 NAME: Syntaxin / epimorphin family signature. 

CONSENSUS: ERCJ3-X ( 3 ) -ELIVMA3-X ( 2 ) -ELIVM3-EESH3-X (2 ) -ELIVMT3- 

_ x-EDEVn3-ELIVM3-x<2)- 

CONSENSUS: ELIVM3-EFS3-X (2 ) -ELIVM3-X ( 3 ) -ELIVT3-X (2 ) -t2- 

EGADEl23-x(2)-ELIVn3-EDNl2T3-x- 
25 CONSENSUS: ELIVMFJ-EDESV3-xC2)-CLIVM3. 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 

signature 1- 

CONSENSUS: EGDER3-H-EFYUH3-T-G-ELIVM 3 ( 2 ) -W-x ( 2 ) -ESTN3 . 

30 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 

signature 2. 

CONSENSUS : ELI VMFYH3-ELI VI1F Y3-x-C-EN0RHS3-Y-x-EPARH3-x-EGL3- 

N-ELIVMFYUJDN3- 

NAME: Fetuin family signature 1. 

CONSENSUS: C-x ( 5b ) -C-x ( ID ) -C-x ( 13 ) -C-x ( 17-,lfi ) -C-x ( 13 ) -C-x (2 ) - 

C-x(Sfl)-C-x(lDill)- 

CONSENSUS: C-x ( ID -, 12 ) -C-x ( lb n 22 ) -C . 

NAME : Fetuin family signature 2. 

CONSENSUS: L-E-T-x-C-H-x-L-D-P-T-P . 

NAME : Legume lectins beta-chain signature. 

45 CONSENSUS: ELIV3-ESTAG3- V-EDEf2V3-EFLI3-]>-EST3 • 

NAME : Legume lectins alpha-chain signature. 

CONSENSUS : ELIV3-X-EED03-EF YUKR3-V-X-ELIV3-G-ELF3-EST3 . 

50 NAME: Vertebrate galactoside-binding lectin signature. 

CONSENSUS: W-EGEK3-x-EE<23-x-EKREJ-x(3-.fc,)-EPCTF3-ELIVMF3- 
ENUEGSKV3-x-EGH3-x(3)- 
CONSENSUS: EDENKHS3-ELI VI1FC3 . 

55 NAME: Lysosome-associ ated membrane glycoproteins duplicated 

domain signature. 

CONSENSUS: EST A3-C-ELI VM3-ELI VMF YbO-A-x-ELI VMF YU3-x ( 3) - 

ELIVMFYW3-x(3)-Y. 
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NAME: LAMP glycoproteins transmembrane and cytoplasmic 

domain signature- 

CONSENSUS: C-x ( E ) -D-x (3 H ) -ELIVM3 ( E > -P-CLI VMD-x-ELI VM2-G- 

5 xC£)-ELIVMID-x-G-ELIVM:D(E)- 

CONSENSUS: x-ILIVMni ( M ) - A-CFYl-x-ELIVnJ-x ( E ) -CKR3-CRH3-X Cl-,2) 

ESTAGKE^Y-EEdJ. 

NAME: Glycophorin A signature- 

10 CONSENSUS: I-I-x-EGACH-V-M-A-G-ELIVMID (E) - 

NAME : PMP-ES / EMP / I1PE0 family signature 1- 

CONSENSUS: ELI VMF 1 ( M ) -IS A U-T-x ( E ) -EDNKS3-x-W-x ( ^ -, 13 ) -ELIVJ-W 

x(E)-C- 

15 

NAME: PMP-EE / EMP / MPED family signature £- 
CONSENSUS: ERtiO-EAVID-x-M-EIVJ-L-S-x-ELIU-x (^-EGSAJ- 

ELIVMFJO) . 

20 NAME: Oxysterol-binding protein family signature. 
CONSENSUS: E-EKGD-x-S-H-EHR J-P-P-x-ESTACFJ-A • 



25 



NAME: Yeast PIR proteins repeats signature- 
CONSENSUS: S-(3-EIV3-ESTGNH3-]>-G-^ELIV J-fi-E AIV3-ESTA2 . 

NAME: Seminal vesicle protein I repeats signature- 
CONSENSUS : EIVM3-x-G-<2-D-x-V-K-x(5)-EKN3-G-x(3)-ESTLV3. 



NAME: Seminal vesicle protein II repeats signature- 

30 CONSENSUS: EGSAID-fl-x-K-S-EFYl-x-fl-x-K-ES A3 - 

NAME: Serum amyloid A proteins signature- 

CONSENSUS : A-R-G-N- Y-EED3- A-x-E<2KRID-R-G-x-G-G- x-U- A - 

35 NAME; Spermadhesins f amily signature 1- 

CONSENSUS: C-G-x ( E ) -ELIJ-x ( M ) -G-x-I-x ( T ) -C-x-W-T - 

NAME: Spermadhesins family signature E - 

CONSENSUS : C-x-K-E-x-ELIVM J-E-ELIVMJ-x-EDEiD-x (3) -EGSU-x ( 5 ) -K 

40 x-C- 

NAME: Stress-induced proteins SRP1/TIP1 family signature. 
CONSENSUS: P-U-Y-EST3 ( S ) -R-L . 

45 NAME: Glypicans signature. 

CONSENSUS: C-x (S) -C-x-G-ELIVMJ-x ( H ) -P-C-x ( E ) -EFYJ-C-x ( E ) - 

ELIVnH-x(E)-G-C 

NAME: Syndecans signature. 

50 CONSENSUS: EFYD-R-EIMID-EKRID-K ( E ) -D-E-G-S-Y - 

NAME: Tissue factor signature- 

CONSENSUS: W-K-x-K-C-x ( E ) -T-x-EDENH-T-E-C-D-ELIVMH-T-D-E - 

55 NAME: Transl ational ly controlled tumor protein signature 1- 

CON^EN^US: EIAJ-G-EGASJ-N-EPAJ-S-A-E-EGDEJ-EPAGEJ-xCOnl)- 
EDEG3-x-EDENJ-x(E)-EDE3- 
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NAME: Translationally controlled tumor protein signature 2- 
CONSENSUS : EFLJ-EFYJ-OVTJ-G-E-x-ENAJ-x(2-i5)-OENJ-ILGAS])-x- 
CLV3-D:AVJ-x(3)-CFYll-IEKR3- 
CONSENSUS: lEDED- 

5 

NANE: Tub family signature 1- 

CONSENSUS: F-EKH(2:D-G-R-V-EST3-x-A-S-V-K-N--F-t2 - 

NAME: Tub family signature 2- 
10 CONSENSUS: A-F-ICAGU-I-IESACD-IELI VMH-IESTll-S-F-x-IEGSTJ-K-x-A-C- 

E - 

NAME = HCP repeats signature* 
CONSENSUS: H-R-H-R-G-H-x(2)-OE2<7) - 

NAHE : Bacterial ice-nucleation proteins octamer repeat. 
CONSENSUS: A-G- Y-G-S-T-x-T - 



15 



NAME: Cell cycle proteins ftsld / rodA / spoVE signature- 

20 CONSENSUS: ENVJ-x ( 5 ) -EGTRID-ELIVnAJ-x-P-EPTLIVniB-x-G-ELIVMiD- 

x ( 3) - ELIVMFliO (2 ) - S-CYS AID- 
CONSENSUS : G-G-ESTNU-ESA}- 

NAME: Enterobacterial virulence outer membrane protein 

25 signature 1- 

CONSENSUS: G-ELIVflFYJ-N-ELIVNl-K-Y-R-Y-E - 

NAflE: Enterobacterial virulence outer membrane protein 

signature 2- 
30 CONSENSUS: EFYhU-x (2 ) -G-x-G-Y-EKR3-F> . 

NAME: Hydrogenases expression/synthesis hypA family 

signature • 

CONSENSUS: F-ECSAiB-EFY J-EDEJ-ELI VAU ( 2 ) -x ( 3 ) -ESTD-ELI VrU- 

35 x(lb)-C-x(2)-C-x(12-.lS)- 
CONSENSUS: C-P-x-C- 

NAME : Hydrogenases expression/synthesis hupF/hypC family 

signature- 

40 CONSENSUS: <H-C-ELIV3-EGA3-ELIV3-P-x-Et2KR3-0:LIV31 . 

NAME: Staphylocoagulase repeat signature- 

CONSENSUS: A-R-P-x ( 3 ) -K-x-S-x-T-N- A- Y-N- V-T-T-x ( 2 ) -EDNI-G- 

x(3)-Y-G. 

NAME: 11-S plant seed storage proteins signature* 

CONSENSUS: N-G-X-EDE3 ( 2 ) -x-ELIVHFU-C-ESTS-x (lln!2 ) -EPAG3-D . 
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NAME: Dehydrins signature 1- 

50 CONSENSUS'. S(S)-EDE3-x-D:])ElI-G-x(l,n2)-G-x(D-,l)-EKR3C4) 

NAME: Dehydrins signature 2- 

CONSENSUS : EKR3-ELIM3-K-EDE3-K-ELIM3-P-G. 

55 NANE: Germin family signature. 

CONSENSUS : G-x ( M ) -H-x-H-P-x- A-x-E-ELIVri3 - 

NAME: Oleosins signature- 
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CONSENSUS: EAG3-ESTJ--x(2)--EAG31-x(2>--ELIVMll-ESADni-T-P- 

elivmf:d(M)-f-s-p-elivm:d<3)- 
consensus: p-a- 

5 NAME: Small hydrophilic plant seed proteins signature- 

CONSENSUS: G-EE<2IH-T- V- V-P-G-G-T . 

NAME: Pathogenesis-related proteins Betvl family signature. 

CONSENSUS: G-x ( 2 ) -ELIVMFHI-x ( M ) -E-x ( 2) -ECSTAENID-x (fi-, i ) -EGNDJ- 

10 G-EGS3-ECS3-x(B)-K-x(M)- 
CONSENSUS: EFYJ • 
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NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: EEflJ-G-x-V- Y-C-D-T-C-R . 

NAME: Thaumatin family signature- 

CONSENSUS: G-x-EGFl-x-C-x-T-EGAD-D-C-x ( 1-, B ) -G-x ( 5 -.3 ) -C . 



NAME: Mrp family signature- 

20 CONSENSUS: W-x ( 2 ) -ELIVMH-D-ELIVMYID ( M ) -D-x-P-P-G-T-EGSJ-D . 



NAME: Glucose inhibited division protein A family signature 

1 • 

CONSENSUS: EGSI-P-x-Y-C-P-S-ELIVnj-E-x-K-ELIVrO-x-EKRID-F . 

NAME: Glucose inhibited division protein A family signature 

2 • 

CONSENSUS : A-G-<3-x-ENT2-G-x(2)-G-Y-x-E-ESAGJ(3>-E(2SJ-G- 
ELIVM:DC2)-A-G-ELIVMT3-N-A. 

NAME: N0Ll/N0P2/sun family signature- 

CONSENSUS : EFVJ-D-EKRA3-ELIVMA3-L-X-D-EAV3-P-C-ESTH-EGA3 - 



NAME : PET112 family signature- 

35 CONSENSUS: EDN3-X-EDN3-R-X ( 3 ) -P-L-ELIV J-E-ELIV 3-X-EST2-X-P - 

NAME: Protein smpB signature- 

CONSENSUS: ETA3-G-ELIVM3-x-L-x-G-x-E-ELIVM3-EKc3ni-ESA J-ELIVM3 . 

40 NAME: Hypothetical cof family signature 1- 

CONSENSUS: ELIVFYANJ-ELIVMFAIS-x (2 ) -D-ELIVMF J-ENDJ-G-T-ELI V3- 

ELVY3-ESTANLM3 • 

NAME : Hypothetical cof family signature 2- 

45 CONSENSUS: ELIVMFCJ-G-D-EGSANdJ-x-N-D-x (3) -ELIMFYD-x (2 ) -EAV3- 

xC2)-EGSCPll-xCB>- 

CONSENSUS: ELMPl-x (2) -EGAS3 - 

NAME: RI01/ZKb32-3/MJ04MM family signature- 

50 CONSENSUS: ELIVMJ-V-H-EGAJ-D-L-S-E-EFYID-N-x-ELIVMll - 

NAME : SUA5/yciO/yrdC family signature- 

CONSENSUS: ELIVMTA3(3)-ELIVMFYO-EPG2-T-EDE3-ESTAll-x-EFY3- 
EGA3-ELIVM3-IEGS3 - 



NAME: Uncharacterized protein family UPFDDD1 signature- 

CONSENSUS: EFliU-H-EFMJ-EI Vl-G-x-ELIVJ-ta-x-ENKRJ-K-x ( 3 ) -ELIV3 . 
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NAME: Uncharacterized protein family UPF0003 signature- 

CONSENSUS: G-x-V-x ( 2 ) -ELIV3-X (3 ) -ESA3-X ( L> ) -D-x (3 )-CLIVT3 ( 3) - 

P-N-x(2)-0:LIVMF3<2)- 
CONSENSUS: x(S)-N- 

5 

NAIIEs Uncharacterized protein family UPFDOOM signature. 

CONSENSUS: ELIVM3-x-CLIVMT3-x (2> -G-C-x C 3) -C-ESTAN3-EFY3-C-X- 

ELIVM3-x(t»)-G. 

10 NAME: Uncharacterized protein family UPFD.005 signature. 

CONSENSUS: G-ELIVM3<2)-ESA3-x(5-,6)-G-x(2)-ELIVM3-G-P-x-L- 
x(M)-ESA63-x(M-,b>- 

CONSENSUS: ELIVM3 (2)-x(2)-A-x(3) -T-A-ELIVM3 (2 ) -F - 

15 NAME: Uncharacterized protein family UPFOOOb signature 1. 

CONSENSUS: ELIVMFY3<2)-1>-ESTA3-H-x-H-0:LIVMF3-EDN3. 
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NAME: Uncharacterized protein family UPFOOOb signature 2. 
CONSENSUS : P-ELIVM3-X-CLIVM3-H-X-R-X-CTA3-X-EDE3 • 

NAME: Uncharacterized protein family UPFODOb signature 3< 
CONSENSUS : ELVSA3-ELIVA3-X ( 2 ) -ELIVM3-EPS3-X (3 ) -L-ELIVM3- 

CLIVMS3-E-T-D-X-P- 

25 NAME: Uncharacterized protein family UPFD007 signature. 
CONSENSUS: V-L-EIV3-H-D-1EGA3-A-R. 
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NAME: Uncharacterized protein family UPF0011 signature- 
CONSENSUS: S-D-A-G-X-P-X-ELIV3-ESN J-D-P-G . 

NAME: Uncharacterized protein family UPFDD12 signature- 
CONSENSUS: IEGTAI-x ( 2 ) -EI VT3-C-Y-D-ELI VM J-x-F-P-x (=1 ) -G . 



NAME: Uncharacterized protein family UPFD01S signature. 

35 CONSENSUS:- EDE3-ELIVMF3 (3 > -R-T-ESG3-G-X (2 > -R-X-S-X-EFY3- 

ELIVM3<2)-U-<2. 



NAME: Uncharacterized protein family UPFDDlb signature. 

CONSENSUS: E-ELIVM3-G-D-K-T-F-ELIVMF3 (2 ) -A - 

NAME: Uncharacterized protein family UPFD017 signature. 

CONSENSUS: D-x (fl ) -EGN3-ELFY3-X ( H ) -CDET3-ELY3-Y-X (3 ) -EST3- 
x(?>-EIV3-x(2)-EPS3-x- 

CONSENSUS: ELIVM3-x-ELIVM3-x ( 3) -EDN3-D . 

NAME: Uncharacterized protein family UPFODIT signature- 

CONSENSUS: L-P-V-E VT3-EN(3L J-F-EAT3-A-G-G-ILIV3-A-T-P-A-D-A-A- 
ELM3 • 



50 NAME: Uncharacterized protein family UPFDD20 signature. 

CONSENSUS: D-P-ELIVMF3-C-G-EST3-G-X (3 ) -CLI3-E • 



NAME: Uncharacterized protein family UPFD021 signature. 
CONSENSUS : C-K-x ( 2 ) -F-x ( M ) -E-x (22 ,23) -S-G-G-K-D - 

NAME: Uncharacterized protein family UPFD023 signature. 
CONSENSUS: D-X-D-E-ELIV3-L-X < M ) -V-F-x (3 ) -S-K-G - 
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NAME: Uncharacterized protein family UPFD024 signature- 

CONSENSUS: G-x-K-D-EKR3-x-A-ELV J-T-x-fl-x-ELIVFID-ESGCl - 

NAME: Uncharacterized protein family UPFD025 signature- 

CONSENSUS : D- V-ELIV3-X ( 2 ) -G-H-ESTID-H-x ( IE ) -ELI VMF3-N-P-G - 

NAME: Uncharacterized protein family UPFDD27 signature- 

CONSENSUS: (3-ELIVMl-x-N-x- A-x-IELIVnU-P-x-I-x ( t, ) -ELIVMH-P-D-x 

H-x-G-x-G-x(2)-EIV3-G- 

NAME: Uncharacterized protein family UPF0D2A signature- 

CONSENSUS : EGA3-EGS3-G-EGA3- A-R-G-x-ESA3-H-x-G-x ) -EIV3-X- 

EIVJ-D-xCB)-EGA3-G-x-S- 
CONSENSUS: x-G- 

NAME: Uncharacterized protein family UPFOD2 c 5 signature. 

CONSENSUS: G-x (E) -ELIVM3 ( 2 ) -x (2 ) -ELIVM3-X ( M ) -ELIVM3-X ( 5 ) - 

ELIVM3(2)-x-R-EFYU3(E)-G- 
CONSENSUS : G-x (2 ) -ELIVM3-G - 

NAME: Uncharacterized protein family UPFD03D signature. 

CONSENSUS: EGA3-L-I-ELIV3-P-G-G-E-S-T-ESTA3 . 



NAME: Uncharacterized protein family UPFQD31 signature 1- 

25 CONSENSUS: ESAV3-EIVU3-ELVA3-ELIV3-G-EPNS3-G-L-EGP3-X- 
EDENflTJ. 

NAME: Uncharacterized protein family UPFD031 signature 2- 

CONSENSUS: EGA 3-G-X-G-D-ETV3-ELT3-EST A3-G-X-ELIVM3 . 



NAME: Uncharacterized protein family UPFDD32 signature- 

CONSENSUS: Y-x (2 ) -F-ELI VM AD ( 2 ) -x-L-x C M ) -G-x ( 2 ) -F-EEG3- 

ELIVMF3-P-ELIVM3 - 



35 NAME-: Uncharacterized protein family -UPFE3033 signature - 

CONSENSUS: L-EDN3-X ( 2 ) -ETAG3-X ( 2 ) -C-P-x-P-x-ELIVM 3 - 

NAME: Uncharacterized protein family UPFD03M signature. 
CONSENSUS: ELIVM3-EDNG3-ELIVM3-N-X-G-C-P-X ( 3 ) -ELIVMASt33-x ( S ) 

40 G-ESACJ. 

NAME: Uncharacterized protein family UPFD03S signature. 
CONSENSUS: L-L-T-X-R-ESA3-X (3 ) -R-x (3 ) -G-x ( 3 ) -F-P-G-G . 



45 NAME: Uncharacterized protein family UPFDD3Li signature- 

CONSENSUS: H-X-S-G-H-EGA3-X (3) -EDE3-X (3) -ELM3-X (5) -P-x (3) - 

ELIVM3-P-X-H-G-EDE3. 

NAME: Uncharacterized protein family UPF0Q3A signature. 

50 CONSENSUS: G-x-ELI 3-x-R-x (2 ) -L-x ( H ) -F-x ( fl ) -ELIV3-X ( 5 ) -P-x- 

ELIV3- 

NAME: Uncharacterized protein family UPFDOMM signature- 

CONSENSUS: L-EST3-X (3) -K-x C 3 > -EKR3-ESGA3-x-EGA3-H-x-L-x-P- 

55 ELIV3-x<2)-inLIV3-EGA3- 
CONSENSUS: x(E)-G- 

NAME: Uncharacterized protein family UPFD047 signature. 



-480- 



WO 01/98454 PCT/IB01/02050 
CONSENSUS' S-X(S)-CLIVJ-x~Q:LIV3-x(2)-G-x(M)-6-T»W-(3-x-ELIV3- 

NAME : Uncharacterized protein family UPFD054 signature- 
CONSENSUS: H-EGSl-x-L-H-L-ELIJ-C-EF YliO-D-H - 

5 

NAME: Uncharacterized protein family UPFDD57 signature- 
CONSENSUS: ELIVJ-x-IESTAID-IELIVFID C3) -P-P-fllLIVAID-EGAID-EIVID-x CM )- 

EGKN3- 

10 NAME: Hypothetical YERD5?c/yjjV family signature- 

CONSENSUS: P-EATl-R-ESAa-x-ELIVflYH-x (2) -EAK3-x-L-P-x(t)- 

ELIVIO-E- 

NAME: Hypothetical hesB/yadR/yf hF family signature- 
15 CONSENSUS: F-x-ELIVMFYl-x-N-lEPGni-lCNSO-x ( M ) -C-x-C-EGSIB-x-S-F - 

NAME : Hypothetical yabO/yceC/sf hB family signature- 
CONSENSUS: ENHYD-R-ILLIIB-D-x ( 2 ) -T-EST J-G-ELIVMAl-ELIVriFl) ( 2 ) - 

ELIVUFG J-CSGAC3 - 

20 



Deposit of Clones 
25 Each clone has been transfected into separate bacterial 

cells (E- coli) in the composite deposit- 

The clones are located and publically available from the 
Resource Center of the German Human Genome Project (Heubner Ueg 
fcn mDST Berlin-i GERMANY)t from which each clone comprising a 
30 particular polynucleotide is obtainable- The Resource Center 

library numbers are slightly different that those presented herei 
but may be readily obtained by the following key or with the 
assistance of Resource Center personnel - 

The library name becomes a number: brain (hfbr2) becomes 
35 5bm kidney (hfkd2) becomes 5t.b=i mammary carcinoma (hmcfl) 

becomes 727n testis (htes3) becomes L43H=i amygdala (hamy2) becomes 
7Lli melanoma <hmel2) becomes 7b2 and uterus (hutel) becomes Sot- 
Next-, the plate number is converted to two digits (e.g-i u 2" 
becomes U D2") and is moved behind the plate coordinate-i and the 
underscore is dropped. The following examples are helpful: 
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Listed Number Resource Center Number 

DKFZphamy2_lDhl7 DKFZp7blH171D 

DKFZphf br2_?ai21 DKFZpSbMI217fl 

45 DKFZphf kd2_3kl DKFZp5bbK013 

DKFZphmcfl_lc23 DKFZp727C231 

DKFZhmel2„12jl D)CFZp7L2 JD112 

DKFZphtes3_lL,b5 DKFZpHSMBDSlI* 

])KFZphutel_17k7 DKFZp5flbKD717 
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The libraries were constructed using two commercially 
available vectors- The brain (hfbrE designations) and kidney 
(hfkdE designations) libraries utilize pAMP 1 from Life 
5 Technologies and are maintained in XL-EBlue (Strategene) =• the 
amygdala (hamyE)i testes (htes3) and melanoma (hmelE) libraries 
are constructed in pSPORTln also from Life Technologies! and are 
maintained in DH10B (Lif eTechnologies) - In addition to the 
following techniques-i consultation with the commercial literature 

10 available on these clones will make evident all of the 

housekeeping techniques needed to propagate and isolate the 
individual constructs. All inserts may be excised with a 
Notl/Sall digestion- Al ternatively-i universal primers^ flanking 
the cloning region! may be used to amplify the inserts using PCR 

15 methods - 

Bacterial cells containing a particular clone can be 
obtained from the composite deposit as follows: 

An oligonucleotide probe or probes should be designed to the 
sequence that is known for that particular clone- This sequence 

20 can be derived from the sequences provided hereini or from a 
combination of those sequences- Methods of probe design are 
presented below- 

Oligonucleotide probes may be labeled with - 32 P ATP 
(specific activity hUDU Ci/mmole) and TM polynucleotide kinase 

25 using commonly employed techniques for labeling oligonucleotides. 
Other! non-radioactive labeling techniques can also be used- 
Unincorporated label typically is removed by gel filtration 
chromatography or other established methods. The amount of 
radioactivity incorporated into the probe can be quantified by 

30 measurement in a scintillation counter- Preferably! specific 

activity of the resulting probe generally should be approximately 
HXlU h dmp/pmole- 

The bacterial culture containing the pool of full-length 
clones should preferably be thawed and 100 1 of the stock used 

35 to inoculate a sterile culture flask containing £5 ml of sterile 
L-broth containing ampicillin at SO - 10D g/ml (for XL-EBlue 
strains 25 g/ml tetracycline should also be used)- The culture 
should preferably be grown to saturation at 37°C-! and the 
saturated culture should preferably be diluted in fresh L-broth- 
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Aliquots of these dilutions should preferably be plated to 
determine the dilution and volume which will yield approximately 
5D0D distinct and well-separated colonies on solid 

bacteriological media containing L-broth containing ampicillin at 
5 1D0 g/ml (for XL-2Blue strains 25 g/ml tetracycline should also 
be used)and agar at 1.5* in a 150 mm petri dish when grown 
overnight at 37°C Other known methods of obtaining distinct-! 
well-separated colonies can also be employed* 

Standard colony hybridization procedures should then be used 

10 to transfer the colonies to nitrocellulose filters and lyse-i 

denature and bake them. The filter is then preferably incubated 
at b5°C. for 1 hour with gentle agitation in b x SSC (20 x stock 
is 175-3 g NaCl/liter-, fifi-2 g Na citrate/liter-, adjusted to pH 
7-0 with NaOH) containing 0-5* SDSi 100 g/ml of yeast RNA-i and 

15 ID mM EDTA (approximately 10 mL per 150 mm filter). Preferably-i 
the probe is then added to the hybridization mix at a 
concentration greater than or equal to lX10 b dpm/mL. The filter 
is then preferably incubated at b5°C. with gentle agitation 
overnight. The filter is then preferably washed in 500 mL of 2 x 

20 SSC/Q-5* SDS at room temperature without agitation^ preferably 
followed by 500 mL of 2 x SSC/0.1* SDS at room temperature with 
gentle shaking for 15 minutes. A third wash with 0-1 x SSC/0-55C 
SDS at b5°C- for 3D minutes to 1 hour is optional. The filter is 
then preferably dried and subjected to autoradiography for 

25 sufficient time to visualize the positives on the X-ray film- 
Other known hybridization methods can also be employed. 

The positive colonies are picked-! grown in culture-, and 
plasmid DNA isolated using standard procedures. The clones can 
then be verified by restriction analysis! hybridization analysis-! 

30 or DNA sequencing. 

Alternatively-! clones may be grown as described abovei and 
PCR used to isolate the insert DNAs • Methods of PCR are 
described below and are otherwise well known . 

ERROR SCREENING 

35 The DNA sequences found herein derive from individual clonesi 

which are publicly available^ as noted above. Thusi the skilled 
artisan will recognize that any specific sequence disclosed herein 
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readily can be screened for errors by resequencing a particular 
fragments in both directions (i.e.-. by sequencing both strands). 
Alternatively-, error screening can be performed by amplifying 
and/or cloning any of the inventive DNAs-. using for example RT- 
PCR-i and sequencing the resulting amplified product- In the 

event that there is a sequencing errori reference should be made 
to the deposited clone as the correct sequence. 

USES AND BIOLOGICAL ACTIVITIES OF THE INVENTIVE MOLECULES 

The inventive molecules and their derivatives are susceptible 
to a wide variety of uses-, based on functional and/or structural 
properties. The skilled worker will appreciate-, based on the 
biological activities detailed below-, and discussed with regard to 
the individual sequences herein-, that the inventive molecules will 
find usefulness in numerous therapeutic and diagnostic 
applications- 

The DNA molecules-, especially the potassium salts thereof-, 
can be used as fertilizer supplements due to their high nitrogen 
and phosphorus contents. Since the DNAs are of defined length-, 
they are also useful in gel electrophoresis as molecular weight 
markers. Due to their similarity with known molecules-* certain of 
the DNA molecules and their variants and derivatives may be used 
in any number of different diagnostic procedures and therapeutic 
applications. They may also be used to make the encoded proteins. 

The proteins themselves have many possible uses. They may be 
used as a nutritional supplement for humans-, animals and even for 
laboratory use as-, for example-, medium for bacterial cultures. 
Moreover-, since the proteins are of defined-, known sizes-, they may 
be used as molecular weight markers for gel electrophoresis and 
gel filtration. Because they are of defined sequences-, they also 
have use in microsequencing and protein fingerprinting 
applications - 

Expression Profiling Applications 

Given their known tissue expression and functional 
associations-. assemblages of the inventive proteins (or 
corresponding antibodies) and nucleic acids are particularly 
suited to expression profiling applications- Expression profiling 
generally entails constructing an array of indicators that signal 
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the presence of a particular RNA or protein expression product- 
Such arrays can be used to evaluate! for examplei pharmacological 
effectiveness and toxicity- In particular! expression profiles 
from such arrays can be generated from cells treated with known 
5 compounds! having known properties! and these profiles can be 
compared to profiles of unknowns to evaluate similarities and 
differences! which can be correlated with efficacy or toxicity. 

Additional uses of profiling include diagnosis! tracking 
development! and ascertaining signaling and metabolic pathways- 

10 For examples of references describing profiling and its usesi 

see Farr et al.! U-S- Patent 5!fill!S31 (mfi>! Seilhamer et al.i 
U-S- Patent 5!flMD!MfiM (mfl)! Rine et al-! U-S- Patent No. 
S!?77!fifia Cmfl)! DO ^7/27317! NO ^/US3E3'i 1)0 ^/D^lfi! and UO 
TT/mBb^. For a device for implementing such techniques! see 

15 Lipshutz et al . ! U-S- Patent No. 5^BBh^im (l^T) and Anderson et 
al-! U.S. Patent No- SilSS-iSIl Cl^T). 

In one embodiment! a subset of the inventive DNAs will be 
arrayed on a substrate! like a gene chip! a filter or a Tb-well 
plate- Test samples containing cells are maintained in the 

20 presence of a label capable of incorporation into nascent mRNA. 
Samples are treated with test and control compounds! which will 
induce mRNA expression in the sample! resulting in incorporation 
of label* Whole mRNA is isolated and applied to the array such 
that it hybridizes with the DNAs contained therein- After 

25 washing! the amount of hybridization is quantified and a profile 
is generated. These steps are repeated with various control and 
test compounds! thereby generating a library of profiles! which 
can be used to ascertain the relationships relevant to 
pharmacological efficacy or toxicity. 

30 The matrices used in such profiling! however! need hot be 

limited to those utilizing DNAs- Rather! other nucleic acids! 
like RNAs and protein nucleic acids (PNAs>! as well as the 
inventive proteins and antibodies corresponding to the inventive 
proteins may also be employed. Hence! for example! antibodies 

35 could form the array and the samples could be treated in order to 
label nascent proteins. Whole proteins then would be isolated and 
applied to the antibody matrix- Developing the resulting signal 
would result in a protein expression profile! which is useful in 
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essentially the same manner as the nucleic acid profile- A 
protein matrix could be usedi for examples in evaluating antibody 
responses to pharmaceutical agents in order to eliminate possible 
cross-reactivity - 

5 Moreover-i where nucleic acids are used in the matrixi it is 

often beneficial to use variants (as defined below) of the 
molecules described hereinin- This can be used to account for 
genetic variations that are of little or no consequence to the 
function of the resultant gene product- Hence-i they can account 
10 for wobble or conservative amino acid variations that do not 
perturb functions like variations in some of the protein' motifs 
elucidated below. Thus-t each position in the matrix can employ 
multiple nucleic acid probes that account for a series of 
variants. 

15 Expression profiling may also be donei in another embodiment-! 

using two-dimensional protein gels in which the inventive proteins 
are detected- The resultant profiles can be used in the same way 
as described- 

Ilatrices useful for profiling may be constructed based on 
20 different criteria. Of course-i the more relevant profiles will 
take into account expression of most human genesn preferably all 
of them- In certain situations-i however-r it is advantageous to 
look at a smaller subset- For example-i if one were concerned 
about fetal neural toxicity! a fetal brain-specific matrix might 
25 be chosen- On the other hand-i if one were interested in targeting 
mammary carcinoma tissue-i a corresponding matrix could be used- 
Thus-i matrices may be constructed using all of the sequences 
available from a tissue-specific library- 

* * * 

30 The following discussion relates to some of the various 

functional and structural groupings that would be of interest to 
the artisan wishing to construct profiling matrices- Of coursei 
the artisan will also recognized that these functional 
descriptions may find additional applicability in the therapeutic 

35 and diagnostic applications discussed below- 

Cell Cycle 

A proliferating cell must coordinate replication and 
chromosomal separation to ensure that the genome is replicated 
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completely-i and that a single copy is correctly inherited by each 
daughter cell- The cell cycle is the coordinated series of events 
that achieves these aims. Many of the key events are initiated by 
a family of conserved Seiren/threonine protein kinases-! the . 
cyclin-dependent kinases (CDKs)-. that are activated by the 
cyclin family of proteins (cyclins A-H) - In turn-i the cyclin-CDK 
complexes are modulated by other protein kinases or phosphatases-, 
and by binding specific inhibitor proteins - The enormous variety 
of ways in which CDK activity can be regulated allows the cell 
to respond to internal signals generated by preceding events in 
the cell cycle and to external growth signals- 

The somatic cell cycle is divided into four phases: DN A 
replication (S phase) and chromosome separation (M phase) are 
separated by gap phases (61 and 6E) - At specific control points 
the decision to begin the next stage (DNA synthesis or mitosis) 
is carefully regulated- 

Cdc2-. the primary kinase-! is especially required for the Gl- 
S transition and S phase- Cdc4 and Cdcb are involved at the 
restriction point-i where the cell can decide to proliferate or 
20 arrest (GK->G0) and Cdc7 is a CDK activating kinase (CAK) as 
well as a subunit of TFIIH - 

The Cyclin-CDK complexes are regulated in various ways- One 
is through phosphorylation by CDK activating kinases (CAK)-. like 
the Y15 kinase (lileel) and dephpsphorylation by CDK associated 
25 phosphatases (CAP)-, like CdcESA a member of the CdcSS family 
(Cdc25Ai B and C) - 

An other way of regulation occurs through two classes of CDK 
inhibitors (CKI)i the INK4 proteins pl5-. pit-, plfi-. and pi"!-, who 
negatively regulates the cyclin D CDK complexes and second the 
30 pSl family with pEl-, p57-. and pS7. 

The cell cycle is also regulated through ubiqui tin-mediated 
proteolysis involving the destruction of both cyclins and CDK 
inhibitors by the EbS proteasome-. that requires an ubiquitin 
conjugating enzyme (UBC) and an ubiquitin ligase- The instability 
35 is conferred by PEST regions (cyclin D and E) or a ten amino acid 



10 
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region in the amino terminus (degradation box) in the A- and B- 
type cyclins- 



All these modifications play an important role for the 
cellular localization! because only the nuclear CDK-cyclin 
5 complexes are functional for cell cycle- During Gl phase of the 
cell cycle-i cyclines A-i E and D are synthesized and bind to their 
cyclin-dependent kinase (CDK) partners. CDK complexes containing 
cyclins At E and Dl are then imported into and concentrated 
within nuclei. Cdkb- cyclin D3 has been localized to both 

10 cytoplasmic and nuclear compartmentsn although only the nuclear 
complex is active- As cells enter S phase-i cyclin A and cyclin E 
complexes remain within the nucleus-i whereas cyclin Mi 
relocalizes to the cytoplasm for proteolysis at the onset of S 
phase. Like CdkE-cyclin Ai CdcE-cyclin A is nuclear and remains 

15 so until it is degraded during mitosis. By contrasts as a result 
of ongoing nuclear import and more rapid re-export! cyclin Bl-i 
which binds to CdcE upon synthesis during S phase-) is 
predominantly cytoplasmic- CdcE-cyclin BE is also cytoplasmic-/ 
although this might occur through anchoring of the complex to 

20 some cytoplasmic constituent. At prophase-i phosphorylation of 

cyclin Bl promotes accumulation of CdcE-cyclin Bl in the nucleus-, 
whereas cyclin BE remains in the cytoplasm until nuclear envelope 
breakdown. 

Two crucial regulators of CdcE-cyclin B-Ueel and CdcESC 
25 exist and are responsible for the GE to M control point- LJeel is 
a nuclear protein throughout the cell cycle-i whereas CdcE5C binds 
to 14-3-3 proteins during interphase and remains predominantly 
cytoplasmic. In some systems CdcESCn like cyclin Bin rushes 
precipitously into the nucleus just before entry into mitosis. 

30 The lld-kDa retinoblastoma (tumor suppressor) protein (RB)-i 

a pRB-family member is an important regulator of cell-cycle 
progression and differentiation. Like the EEF family CEEF1-5) or 
DP family (DP1-3) of transcription activators-i RB suppresses 
inappropriate proliferation by arresting cells in Gl by 

35 repressing the transcription of genes required for the transition 
into S phase- Before the cell proceeds into S phase-i RB becQmes 
phosphorylated at multiple sites by the cyclin dependent protein 
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kinases (CDKs) and loses its transcriptional repressing activity. 
Phosphorylation of RB during late Gl phase results in the 
dissociation of the ESF-RB repressor complex which allows S-phase 
specific genes to be transcribed- Cyclin E is the evolutionary 
5 conserved target for EEP and interacts together with CDCH in late 

For a proliferating cell it is vital that only undamaged DNA 
is replicated because if DNA damage is substantial -> its 
replication can lead to chromosome loss or rearrangement- Thus-i 
10 we find a GK->S checkpoint in late Gl that requires tumor 
suppressor p53- A p53-dependent Gl arrest is effected by the 
cyclin dependent kinase inhibitor p21 through higher expression 
levels that inhibits almost all cyclin CDK complexes. 

The kinase responsible for phosphorylat ing the unidentified 
15 kinetochore component in metaphase may be a member of the MAP 
kinase family and appears to be the proto oncogene c-MOSn a 
cytostatic factor (CSF) in meiosis- 

Several categories of proteins are coded for by clones of 
the invention within the overall group of "Cell cycle^and 
20 include-i among othersi the following: 



PAgb-Tg proteins PAgb-TE is a p53 responsive gene- The protein is 
predominantly expressed in brain-i breast and kidney and 
represents a novel regulator of cellular growth- Isoforms are 

25 differentially induced by genotoxic stress (UVi gamma-irradiation 
and cytotoxic drugs)in a p53-dependent manner- The p53 tumor 
antigen is found in increased amounts in a wide variety of 
transformed cells- The protein is also detectable in many 
actively prolif eratingi nontransf ormed cellsn but it is 

30 undetectable or present at low levels in resting cells- PS3 is 

postulated to bind as a tetramer to a p53-binding site (PBS) and 
to activate the expression of adjacent genes that inhibit growth 
and/or invasion- Deletion or inactivation of one or both pS3 
alleles reduces the expression of tetramersi resulting in 

35 decreased expression of the growth inhibitory genes- This 

mechanism is found in tumors of several types- (OMIN fclTll?!!) 
Clones in this category include: amyB_121mg 
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Cell structure and motility 

One of the major differences between prokaryotes and 
eukaryotes is the ability of the eukaryotic cell to adopt very 
different shapes dependent on its function during the 
differentiation process- Animal cells vary from being round to 
extended cylindric forms like motorneurons or muscle cells- In 
humans-i more than 1DD different cell types can be distinguished n 
each having a characteristic shape- The form of a cell often is 
closely related to its capacity to move- Some completely 
differentiated cells like fibroblasts can still change their form 
actively-i thereby migrating- Other cell types serve as motor 
elements - "macroscopical ly 11 like muscle cells or 
"microscopically 11 like ciliated epithelia- Such tasks are 
fulfilled by a big class of proteins; on the one hand responsible 
for maintenance of cell structure and contacting neighbor cells 
or the intercellular matrix and on the other hand for cell 
motility. These topics cannot be regarded separately: The 
motility apparatus e.g. must be fixed in the cytoskeleton - Three: 
different types of filaments can be distinguished: Actin 
filamentST tubulin filaments and intermediate filaments-i each 
present in almost all types of cells- 

Actin filaments (F-actin) are built up of monomers (G- 

Actin)- In muscle cells-i actini myosin-, for both of which several 

paralogous genes are knowni as well as many more proteins are 
constituents of the contractile apparatus- 

The "thin 11 and "thick filaments 11 in a muscle cell consist 
mainly of actin and myosin-i respectively. 

Several different proteins are responsible for the anchoring 
of the actin filaments in the Z-disks (e.g. alpha-actinin and 
desmin) or at the end of the myofibers in the cell membrane. 

Troponin I-i -C-i -T and Tropomyosin - associated with actin - 
confer the Ca++- dependent triggering of contraction. 
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Length of the sarcomere is controlled by the giant protein 
titin- 

In smooth muscle^ there is no troponin* Contraction activity 
is controlled by phosphorylation / dephosphorylation of myosin by 
5 a specialized kinase instead- Contractile fibers are not* 
organized in sarcomeres. 

Apart from contributing to muscle contraction! the 
actomyosin system is responsible for many other motions at 
cellular leveli e-g- the amoeboid movement of pseudopodia or the 
10 fission of cells at the end of mitosis by a contractile ring. 

Besides this-» actin fibers fulfill structural tasks like 
maintenance of the shape of stereocilia or microvilli-. Here-i 
actin filaments are connected by proteins like fimbrin- But not 
only specialized structures like the mentioned ones contain actin 

15 fibers- There is a network covering the complete cell volume with 
F-actin as a major constituent. Whereas the actin filaments in 
the structures mentioned above are relatively stablei this F- 
actin is highly dynamic. Management of the network structure and 
turnover is achieved by connecting proteins like alpha-actininn 

20 fimbrin or fill-in^ turnover is regulated by gelsolin-, villin-i 
and different capping- and fragmentation-proteins- 

Microtubules are built up of alpha-beta tubulin 
heterodimers - Turnover of filaments is achieved by building-in 
and releasing of monomers with different time constant rates at 

25 both ends. The resulting cycle is called "treadmilling". Thirteen 
strings of tubulin duplets build up one subfiber-* whereas one 
fiber contains two or three of those- A complete axoneme consists 
of ^ radial and S central fibers- This u ^+3' n - structure is the 
basis both of flagellai their basal bodies and centrioles. In 

30 flagellai several additional structures like radial elements 

exist- Nexin connects the fibers and dyneine is the motor ATPase 
which shifts the fibers relative to each other. Several genetic 
diseases like the Cartageneric syndrome are caused by 
deficiencies of distinct proteins in cilia- 

35 Besides this i microtubules are abundant in all types of 

cells- They are part of a delivery system for organelles! e-g. in 
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the golgi apparatus- A further very important system based on 
microtubules is the mitotic spindlen it is organized by the 
centrosomes. Besides many other components! the major part of a 
centrosome are two centrioles which are built up of nine 
5 microtubule-tr iplets - Most remarkably-i new centrioles are not 
synthesized de novo but generated by duplication of old ones. 

Cytoplasmic microtubules are associated with many different 
proteins. Two major classes are known: The MAPs ( "microtubule- 
associated proteins"i with molecular masses between EDO and 300 

10 kD) and the much smaller tau-Proteins with a fib! between bD and 7D 
kD- These proteins regulate the treadmill-process and the 
interaction with other structures in the cell- 
Besides actin and myosin the so-called intermediate 
filaments constitute a third class of filaments- In contrast to 

15 the former two groupsi they do not participate in motility-i nor 

are they dynamic structures subject to a vivid turnover- The most 
important ones are neurofilaments (in neurons) i keratin filaments 
(mainly in epithelial cells)i and vimentin filaments (in many 
sorts different cell types) - 

20 The biological function of both the cytoskeleton as well as 

contractile apparatus of a cell does not end at the cell 
membrane. Cells must be embedded in the extracellular matrix-i 
all cells of a muscle must act as one single mechanical unit and 
epithelia must resist macroscopic mechanical forces. Hence-i cell 

25 adhesion and the extracellular matrix are closely connected to 

the cytoskeleton- Vincullin is one of the proteins which serve as 
an anchor for intracellular fibers (actin). Different types of 
desmosomes and tight junctions connect neighbor cells with 
intercellular fibers- On the inside-* cytoplasmic plaques connect 

30 them to the cytoskeleton- These structures! on the one handi 

serve as mechanical elements whereas gap junctions! on the other 
handn connect cells metabolically - 

The extracellular matrix consists of a network of proteins-i 
glycoproteins and polysaccharides- Different proteins are present 
35 in relation to different mechanical demands:- Elastin is found 
in tissues with high elasticity (lungsn heart) whereas collagen-i 
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a more hard-wearing protein-* is found in tendons and ligaments- 
Fibronectin is an extracellular protein highly important for cell 
adhesion- 
Reference: Murray J et al (1^2): Cell Motil Cytoskeleton 
5 22: 211-223. 

Within the overall group of Cell Structure and Motility 
several categories of proteins are coded for by clones of the 
invention: 

Ankvrins : Ankyrins are peripheral membrane proteins which 

10 interconnect integral proteins with the spectrin-based membrane 
skeleton- Thus these proteins are involved in coupling of cyto 
skeleton and cell membrane- OMIN reports that Ankyrins have 
associations (as potentially diagnostic-! therapeutic-i causative-, 
and/or related-, etc..) with the following diseases: 1) Heriditary 

15 Spherocytosis (OMIN jkIBE^DEJ ) =; 2) Hemolytic Poikilocytic Anemia 
due to reduced ankyrin binding sites (OMIN im?D0)i 3) Atypical 
Elliptocytosis (OMIN 225M5D>n M ) Autosomal recessive 
spherocystosis (OMIN ftBTMTO ) i 5) Werner Syndrome (OMIN *2777DD)n 
and b) Rhesus-unlinked type Elliptocytosis (OMIN #13DbOE3). 

20 Ankyrin bindung glycoprotein proteins mediate Ankyrin effectsn 
especially in neuronal adhesion and prostate tumour vcell 
transformation: Clones in this category include: amy2_121f n - 

Tropomyosins are ubiquitous proteins of 35 to MS kD 
associated with the actin filaments of myofibrils and stress 

25 fibers. They are involved in cardiomyopathies (OMIN *niD3Di 
*niD10-i *n0 c n0i *bD0317>- Clones in this category include: 
tes3_lbb5. 

Differentiation/Development 

30 Almost every multicellular organism originates from meiotic 

cell divisions and the recombination of a paternal and a maternal 
set of chromosomes- After fertilization of the egg-i all cells of 
a body originate from this one cell- Thus the cells of the 
developing body are initially genetically alike- But 

35 phenotypically they become very different- They are specialized 
to a certain cell type and arranged in an organized pattern to a 
certain type of tissue and the whole structure has the well- 
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defined shape of an organ. All these features are determined by 
the DNA sequence of the genomei which is reproduced in every 
cell. Each cell acts on the genetic instructions given to a 
certain time and at a certain place of development and plays its 
5 individual part in the multicellular organism- Cell 

differentiation may be divided into three general steps: cell 
cycle exiti apoptosis protection and tissue specific gene 
expression. These processes are coordinated to provide the final 
and unique . tissue characteristics- 

10 An animal cell that has achieved a certain level of 

development is said to be determined. This differentiation of a 
cell may be irreversible and in that case the cell may be renewed 
only by simple duplication- Other cells are renewed by means of 
stem cells which are immortal ( e.g. stem cells of the bone 

15 marrow-, epidermal stem cells). The genetic control of development 
is extensively studied in non-vertebrates and vertebrates. The 
classical animal model is the fruit fly Drosophilia and the 
modern model is the transgenic mouse. Animal transgenesis has 
proven to be useful for physiological as well as 

20 physiopathological studies. Besides the approach based on the 

random integration of a DNA construct in the mouse genome-i gene 
targeting can be achieved using totipotent embryonic stem cells 
tor targeted transgenesis. Transgenic, mice are than derived from 
the embryonic stem cells. This allows the introduction of null 

25 mutations in the genome (so-called knock-out) or the control of 
the transgene expression by the endogeneous regulatory sequence 
of the gene of interest (so-called knock-in). Dice can be 
created that express wild-type genes-i mutant genesi marker genes 
or cell lethal genes in a tissue specific manner. These animal 

30 models allow to follow changes in tissue and organ development 
and lead to a better understanding of the cellular function of 
many genes or to the generation of animal models for human 
diseases- Fundamental problems in immunologyn onset and 
development of cancer-i regulation in fatty acid metabolism-i 

35 aspects of cardiovascular function! control of the central 

nervous system development analysis of reproductive development 
and function are only some examples of research interests. 
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The final stage of cell differentiation is growth arrest- 
In animal tissues with rapid cell turnover terminally 
differentiated cells undergo programmed cell death- The cells 
have the ability to kill themselves by activating an intrinsic 
5 cell suicide program when they are no longer needed or have 
become seriously damaged- The execution of this program is 
termed apoptosis- Apoptosis is of importance for development and 
homeostasis of animals- The key components of this program have 
been conserved in evolution from worms (C- elegans) to insects 

10 (Drosophilia) to humans- The roles of apoptosis include the 

sculpting of structures during development -i deletion of unneeded 
cells and tissues-i regulation of growth and cell number-i and the 
elimination of abnormal and potentially dangerous cells- In this 
way apoptosis provides "quality control mechanism" that limits 

15 the accumulation of harmful cellsi such as virus-infected cells 
and tumor cells- On the other hand inappropriate apoptosis is 
associated with a wide variety of diseases-, including AIDS-i 
neuro-degenerative disorders and ischemic stroke- Because it is 
now clear that apoptosis is a result of an active^ gene-directed 

20 process-i it should be eventually possible to manipulate this form 
of cell death by developing drugs that interact with its recently 
identified mechanisms of action. Inducers of cell 
dif f erent iation-i cell cycle arrest and apoptosis might be the 
novel molecular targets for new anticancer agents in addition to 

25 the signaling pathways for growth factors and cytokines- 

Proteins-, factorsi receptors and genes of importance in 
apoptosis : 

Proteases: 

- Calpain-i an intracellular cysteine proteasei exact role 
30 unknown. 

- Caspase-1 to Caspase-11 -i a family of proteases synthesized 
as an inactive proenzyme- Targets of the activated enzymes 
include: poly ( ADP-ribose> polymerase-! DNA-dependent protein 
kinase-! Ul r ibonucleoprotein i nuclear laminins and cytoskeleton 

35 components (actin). 
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- Granzyme B-i a serine protease released by cytotoxic T- 
cells- 



Receptors: 

- CD (synonyms: Fasi APO-D-i a receptor protein of the 
5 TNF-receptor family which includes TNF-R1 and TNF-R2 with the 

common characteristic of a 70 amino acid cytoplasmic domain- 

- FADD (synonym: HORT-Di a cytoplasmic protein 

- DR-3 (synonym: APO-3) a member of the TNF-receptor-f ami ly 

- DR-M and DR-5 
10 Genes: 

- ced-3i ced-4 and ced-T encode the general apoptotic and 
antiapoptotic program in Caenorhabdi tis elegans- Apaf-3 is the 
mammalian homologue of ced-3» 

- Bcl-E / Bcl-xL / Bax / Bcl-xS / Baks a large gene family 
15 that can either inhibit or promote apoptosis. 

- Cytokine response modifier An a cowpox virus gene whose 
gene product inhibits caspases- 

Others : 

- Caspase-activated DNase (CAD) and its inhibitor (ICAD)-i. 
20 causes DNA fragmentation in the nucleus 

- Ceramide-i a complex lipid that acts as a second messenger. 

- c-Jun N-terminal kinase (JNK) is a proline-directed kinase 

- p53 protein-i is essential for the induction of apoptosis 
as a response to chromosomal damage- 

25 - RAIDDt a death signal-transducing protein- 

- Receptor interacting protein (RIP) is an accessory protein 
with a death domain and a serine/threonine kinase activity- 
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- Sphingomyelinases an enzyme that hydrolyzes the complex 
lipid sphingomyelin to ceramide- 



- Tumor necrosis factor (TNF) is a type -II membrane protein 

- TNF-receptor associated factor (TRAFB)-i is an accessory 
5 protein that can bind to both TNF-R1 and TNF-R2 • 

Within the overall group of Differentiation/Development-, 
several categories of proteins are coded for by clones of the 
invention: 

10 Notch family proteins: Notch family molecules are negative 

regulators of neuronal differentiation in early brain 
development- Clones in this category include: amy2_li2 l t- 

Tesfcis-specif ic Y-encoded proteins: The TSPY genes are 
15 arranged in clusters on the Y chromosome of many mammalian 

species. TSPY is believed to function in early spermatogenesis 
and is a candidate for GBY-i the putative gonadoblastoma-inducing 
gene on the Y» These proteins are involved in early 
spermatogenesis. Clones in this category include: amy2_7j5. 

20 Inflammation-mediating proteins : Inflammation is a basic 

mechanism responsible for recruiting and activation of iromun- 
competent cells. By various mediators-! cells are activated and 
triggered to differentiate. Hyperact i vat ion of these pathways 
leads to various disease states: In neuronal tissues! in 

25 inflammatory diseases such as experimental autoimmune 

encephalomyelitis (EAE) i neuritis(EAN) and uveitis (EAU) 
allograft inflammatory factor-1 is produced by macrophages and 
microglia cells. Clones in this category include: amy2_2bn. 

Intracellular transport and trafficking 
30 Eukaryotic cells rely for their viability on the 

partitioning of many basic cellular processes into membrane- 
bounded organelles. These are the nucleus-i endoplasmic reticulum 
(ER)-i Golgi apparatus! endosomesi lysosomal compartments-! 
mitochondria and peroxisomes- Most molecules destined for the 
35 lysosome-i cell surface and outside the cell are routed through 
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the ER and Golgi-i which together with the vesicular intermediates 
between them-i comprise the secretory pathway (Palade 1T7S). In 
the ER and Golgi compartments proteins are sorted-i modified and 
often assembled into complexes en route to their final 
5 destination- Incorrectly assembled proteins are retained in the 
ER until they fold correctly or are targeted for degradation. 
Additional proteins are translocated into and function within the 
lumenal spaces of organelles or are secreted- Thus a large 
proportion of proteins synthesized require targeting to membranes 

10 either for insertion into or transport across them- A major 

purpose of this is growth. The secretory pathway is dependent on 
an intact cytoskeleton and also closely linked to general 
metabolism by affecting ribosome biogenesis (flizuta and Uarnern 
mM). A huge number of proteins is required for targeting! 

15 translocation and sorting of newly synthesized proteins- 

The first step in sorting is the recognition of cis-acting 
targeting or signal sequences that organelle-targeted proteins 
contain- This is carried out by cytosolic targeting factors 
and/or receptors on the membrane to which the protein is 

20 targeted- In some cases the primary sequences are extremely 
degenerate! with only the overall character being conserved 
(hydrophobicity for an ER signal sequence^ helical amphiphilicity 
for mitochondrial targeting sequence (Kaiser et al-«i ITA?} Lemire 
et al • n nfi^). Following the targeting step-i proteins are either 

25 inserted into or transported across the membrane (translocated) 
through a proteinaceous apparatus (termed the translocon). The 
translocon include or recruit motors to drive the translocation 
process in the correct direction (Schatz and Dobbersteini inb). 
Defined intracellular protein transport steps: 

30 • ER 

- targeting to the ER 

- translocation into the lumen of the ERi and-i 
depending on the presence of certain signals in the peptide 
sequence transport through the golgi complex 

35 ■ Mitochondria 

- targeting 

- translocation 
• Peroxisomes 
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■ The general secretory pathway 

- protein modifications assembly and quality control in 

the ER 

- vesicle-mediated trafficking 
5 - vesicle docking and fusion 

- transport through the golgi apparatus and sorting at 
the trans-golgi 

- transport to the cell surface 

- transport routes to the lysosome 
10 ■ Endocytosis 

• Specialized protein transport routes 
» Protein export from the cytoplasm 

References: Palade-i G (1^75) Science 13^ :3M7-35fii Mizuta et 
al. (l^M) flol Cell Biol m : 2M^3-25D2n Kaiser et al- (ITS?) 
15 Science 235: 312-317^ Lemire et al - (1^6=1) J Biol Chem 2kM: 
2D2Db-2D215; Schatz et al- (nU) Science 271: 15n-152b- 

Rab proteins 

In eukaryotic cells the compartmehtalisation of processes is 
a prerequisite for a tight regulation of processes and 

20 activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging-i sorting-i 
secreting-. and recycling proteins and other molecules- 
Trafficking between organelles within the secretory pathway 
occurs as vesicles derived from a donor compartment fuse with 

25 specific acceptor membranes-i resulting in the directional 
transfer of cargo molecules- This process is tightly controlled 
by the Rab/Ypt family of proteins (reviewed by Novick and Zeriali 
m7>T a branch of the superfamily of small GTPases- Rab 
proteins regulate a variety of functions! including vesicle 

30 translocation and docking at specific fusion sites- Rabs may also 
play critical roles in higher order processes such as modulating 
the levels of neurotransmitter release in neuronsi a likely 
mechanism in synaptic plasticity that underlies learning and 
memory (Geppert and SUdhoft mfl)- 

35 Small GTPases share a common three-dimensional fold that-i in 

the GTP bound state-i can bind a variety of downstream effector 
proteins- GTP hydrolysis leads to a conformational change in the 
"switch" regions that renders the GTPase unrecognizable to its 
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effectors- In this way-i by localizing and activating a select set 
of effectors-i a common structural motif is used to control a wide 
array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven 
5 by a set of proteins known as SNAREs. After a vesicle becomes 
dockedi the cytoplasmic domains of VAMP (also termed 
synaptobrevin) and syntaxin on opposing membranes-* in combination 
with a SNAP-25 molecule-i coalesce into an elongated -helical 
bundle (Poirier et al.-i Sutton et al-i ma)-, which may 

10 lead to fusion- Because numerous SNARE isoforms have been 
identified that localize to distinct membrane compartments!, it 
was originally proposed that the specificity of interaction 
between the SNARE proteins accounted for the specificity in 
membrane trafficking- Recent results-i howeveri suggest that 
15 SNAREs are not specific in their ability to form complexes in 
vitroi suggesting that trafficking specificity requires 
additional factors (Yang et al--i lTTT). In this regardi Rab 
proteins are strong candidates for governing the specificity of 
vesicle trafficking. Like the SNAREs ■» many isoforms ( MD) of the 

20 Rab family have been identified that localize to specific 
membrane compartments (reviewed by Novick and Zeriali m?)- 

Concomitant with the SNARE cycle-. Rab proteins undergo a 
intricate cycle of membrane and protein interactions- Rabs are 
posttranslationally modified at C-terminal cysteines by the 

25 addition of two geranylgeranyl groupsi which mediate membrane 
association when the Rab is in the GTP-bound state. After guanine 
nucleotide hydrolysis occursi the Rab is extracted from the 
membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI) . This cytosolic intermediate is then recycled 

30 onto a newly forming vesidei most likely through a secondary 
factor termed a GDI dissociation factor (GDF)-i which displaces 
GDI. After the Rab becomes membrane boundi a guanidine nucleotide 
exchange factor (GEF) promotes release of GDP and the subsequent 
loading of GTP - In its GTP-bound conformation -i the Rab is then 

35 free to associate with its specific set of effectorsi which can 
in turn trigger events leading to the eventual fusion of the 
vesicle with a target membrane- To complete the cyclen perhaps 
after or concurrent with membrane fusioni a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysisn switching off 
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the GTPase- The remaining GDP-bound Rab can then participate in a 
new round of fusion- 

Rab interactions with effectors are likely to regulate 
vesicle targeting and membrane fusion in three ways- First-i a Rab 
5 may specifically facilitate vectorial vesicle transport- Vesicles 
are transported from their site of origin to acceptor 
compartments likely through associations with cytoskeletal 
elements, and transport motors. A protein has been identified with 
a domain structure that suggests a connection between the 

10 cytoskeleton and the Rabs- This protein-i called Rabkinesin-Ln 
contains a kinesin-like ATPase motor domain followed by a coiled- 
coil stalk region and a RBD that specifically binds Rabb (Echard 
et al.-i mfi )• An additional link with the cytoskeleton is 
provided by the Rab effector-i Rabphilin-3A • Rabphilin-3A has been 

15 shown in vitro to interact with -actinin-i an actin-bundling 
protein-i but only when not bound to Rab3A (Kato et al»i ). 
These results raise the intriguing possibility that Rab proteins 
regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate 

20 destinations - 

Secondi Rab proteins may regulate membrane trafficking at 
the vesicle docking step. A number of Rab effectors^ including 
Rabaptin-Sn EEAln Rabphi 1 in-3A t and Rim-i may serve as molecular 
tethers- Each effector protein contains a RBD-i followed by a 

25 linker region (some having the potential to form elongated 
coiled-coil structures)! and a domain capable of interacting with 
a second Rab or the target membrane- Rabaptin-Si for example! 
contains two RBDsi one near the N terminus that specifically 
recognizes Rab 1 * and a second near the C terminus that binds RabS 

30 (Vitale et alo mfi )• Both RinM which is localized to the 
target membrane^ and Rabphilin-3A-i which is localized to the 
vesicle-i contain N-terminal RBDs and C-terminal CaE+-binding CS 
domains-* implicating these effectors in synaptic vesicle 
localization or docking in response to CaE+ influx (Uang et al-i 

35 )- Tethering effectors may also recognize protein complexes 

on the acceptor membrane- SecHp-i a yeast Rab3A homologi interacts 
with the exocyst (Guo et al-n > •> a complex of seven or more 

subunits that is assembled at sites of vesicle fusion along the 
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plasma membrane- The exocyst complex may therefore function as a 
landmark for Rab/ef f ector-mediated vesicle docking. 

Thirdn once a vesicle has become tethered to its fusion 
site-i Rab proteins may selectively activate the SNARE fusion 
5 machinery. The mechanism of this activation is unknown but may 
involve direct interactions of Rabs or-i more likely-i their 
effectors with SNAREs- For example-. Hrs-2 is a protein that binds 
to SNAP-25 and contains a Zn2+-f inger motif characteristic of 
Rab-binding proteins such as Rabphilin-3A-i Rimi EEAIt and Noc2n 

10 suggesting that Hrs-2 may form a physical link between Rabs and 
SNAREs (Bean et al-i m?)- In addition-i certain mutations in the 
syntaxin-binding protein Slylp-t the Seclp homolog utilized in ER 
to Golgi trafficking! eliminate the requirement for Yptlp-» a Rab 
protein that functions at this trafficking step (Dascher et al--i 

15 mi )• -Rabs may therefore regulate SNARE associations through 
Seel family members- In support of this idea-> a Rab effector was 
recently found to interact with a vacuole Rabi a Seclp homolog-i 
and a SNARE protein (Peterson et al--i ITTT )t which suggests that 
this effector serves to connect Rab and SNARE function- In this 

20 wayi Rabs and their effectors may facilitate the correct pairing 
of SNAREs - 

References: Dascher et al- (mi) Mol - Cell- Biol- Hi &72- 
885; Echard et al. (IIIA)- Science- 27^-. SfiO-SfiS, Geppert et al- 
(l^fi) Annu. Rev. Neurosci. 21-. 75-^5; Gud et al- EMBO J. 

25 Ifin 1071-lDflD* Kato et al. (l^b) J - Biol- Chem. 271-» 31775- 
3177fii Novick et al- (m7) Curr. Opin- Cell Biol-, T-i H^b-SOm 
Peterson (I'm) Curr. Biol- 15^-1^2=; Poirier et al- Cl^fl) 

Nat. Struct- Biol. 5-. 7bS-7t,T=i Vitale et al- (l^fi) EMBO J- 17t 
ITMl-nSli Uang et al- (IW) Nature- 3afl-. S'lS-S^flS Yang et al- 

30 (M^D J. Biol- Chem- 27M-, 5b^-5b53. 

Within the overall group of Intracellular Transport and 
Trafficking several categories of proteins are coded for by 
clones of the invention- 

Vesicul ar traff icing : Various proteins are involved in 

35 trafficing of vesicles inside the cell and for the exocytotic 
pathway. For examplei Sec7 of Saccharomyces cerevisiae takes 
function in vesicular traficking- Synaptotagmins are essential 
for Ca (2+)-regulated exocytosis of neurosecretory vesicles- Other 
proteins such as Dynamin are microtubule-associated force- 
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producing proteins-, which are involved in the production of 
microtubule bundles.* By binding and subsequent hydrolysation of 
GTP such proteins provide the motor for vesicular transport 
during endocytosis. Clones in this category include: amyE?_mb5«i 
5 amy_2ol3 and fkdS_3kl- 

Protein sorting: Protein sorting is a process essential for 
the maintenance of a cells functionality and structural 
integrity- Host proteins perform their biological function in 
special compartments in the cell-The process of sorting is 
10 complex and highlky regulated. Clones in this category include: 
melE_7gm. 

Metabolism 

This group includes proteins which are involved in the 

15 uptake and consumption of nutrients-, and enzymes which are part 
of the biochemical pathways for energy metabolism or which are 
involved in the supply of building blocks of nucleic acids-i 
proteins (NTPs-i dNTPs-i amino acids) for DNA/RNA and protein 
synthesis! and fatty acids (membranes) -i to allow for the 

20 generation of higher order structures- This group constitutes the 
most important and largest group in prokaryotes and lower 
eukaryotes. The higher the evolutionary level of an organism is-i 
however-i the more other protein classes like L signal 
transduction 1 -i L cell cycle" 1 and L dif f erentiation and development" 1 

25 increase in importance and number of representatives. 

Proteins involved in the metabolism of energy and compounds 
(here: other than nucleic acids or proteins) are usually the 
products of house keeping genest they are often constituti vely 
and/or ubiquitously expressed- 

30 Several categories of proteins are coded for by clones of 

the invention within the overall group of Metabolism: 

Fatty acid metabolism: OMIN lists more than 50 diseases 
caused by pathologic altered fatty acid metabolism, l-acyl- 
35 glycerol-3-phosphate acyltransf erase is involved in fatty acid 

metabolism and is ubiqitous expressed-! with a slight predominance 
in uterusi placenta and foreskin- Clones in this category 
include: amyE_EcE2 
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Repair and surveillanc e of protein damage: Several classes 
of protein are involved in reapair and surveilance of protein 
damage- L-i soasparty 1 methyl transf erase (Pimt>T as an examplen is 
a highly conserved enzyme utilising S-adenosylmethionine (AdoMet) 
5 to methylate aspartate residues of proteins damaged by age- 
related isomerisation and deamidat ion • Clones in this categroy 
include: fbr2_7fli21- 

Nucleic acid management 

The genetic information is stored in the form of nucleic 
10 acids in all organisms- Two kinds of nucleic acids existi DNA and 
RNA- Whereas the more stable DNA in most organisms constitutes 
the storage form of the genetic inf ormation-i the labile RNA and 
in particular mRNA is an intermediate used for the temporal 
expression of specific genes. 
15 In eukaryotes-i DNA is usually a double stranded linear 

molecule consisting of two antiparallel strands and made up of a 
deoxyribose-i a phosphorus backbone and the four bases A-* C-i G-i 

i 

and T- The DNA of some organisms has a ring structure- The 
structure of DNA was unraveled years ago by Watson and Crick. DNA 
20 is directional molecule determined by the C-atoms of the sugar- 

The most important processes dealing with nucleic acids are: 

• replication (e.g. DNA polymerases-i Telomerase) 

• transcription (RNA polymerases) 

• RNA processing (maturation - splicing and degradation) 

25 • in addition! enzymes and proteins exist which require a nucleic 
acid (mostly RNA) in the active center to be functional 
(ribozymes - e.g. RNase-i Ribosomal proteins) 

The DNA of a cell is replicated in the S-phase of the cell 
cycle. Several enzymes carry out the task of doubling this 
30 nucleic acid- As all steps of the cell cyclei also the process of 
replication is tightly regulated. The enzyme DNA polymerase and 
several other proteins are involved in this process. Whereas many 
prokaryotes do have only one origin of replication (i-e-n the 
starting point of the replication cycle)-i in eukaryotic DNAs 
35 (chromosomes) multiple such start points exist- The switch from 
the synthesis (S) phase to the subsequent GB or M phases of the 
cell cycle are dependent on the completion of the replication. 
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This makes clearn that a number of proteins are involved in the 
replication itself as well as in the control of the process. 
Since most eukaryotic chromosomes are linear structuresi 
additional proteins and enzymes are necessary to make sure that 
5 the structure is maintained through successive generations. This 
includes those proteins necessary to build the three dimensional 
structure of chromosomes (e-g- histones) and the structural 
network of the nucleus and nucleolus (including the defined 
localization of transcriptionally active genes in the vicinity of 
10 nucleoli) but also such enzymes as telomerase which guarantees 
the integrity of the chromosomal ends* 

The expression of genes is usually performed in two steps- 
First a messenger RNA (mRNA) is produced (transcribed) in one to 
many copies and second this mRNA is translated into the protein 
15 product. The regulation of transcription is discussed under the 
separate heading u transcription factors" 1 -! but also the classes 
L signal transduction 1 ! L development "■ i L cell cycle" 1 and others are 
affected as the expression of certain genes determines the fate 
of a cell or organism. 
20 The primary transcript (hnRNA - heterogeneous nuclear RNA ) 

is a single stranded one-to-one copy of the gene as it is located 
on the chromosome. Before a protein can be translated-! already 
during transcription the process of maturation is initiated. 
Firstly-* a S 1 cap structure is enzymatical ly and covalently added 
25 to the RNA i blocking the S 1 end of the RNA • Secondi when the RNA 
polymerase has terminated polymer ization-i the enzyme poly A 
polymerase adds varying numbers of adenine residues to the 3"* end 
of the transcript. This enzyme recognizes the sequence AAUAAA or 
AUUAAA (+ some minor variations)-! cuts the RNA ID - 3D 
30 nucleotides downstream and adds the A residues- The size of the 

poly A sequence affects the stability of the RNA • Finally-i in the 
process of splicing-i the introns present on the genomic level and 
also present in the hnRNA are spliced out by a multi-protein 
complex consisting of several proteins and RNAs - The finally 
35 maturated mRNA is exported to the cytoplasm where it is 
translated with help of the ribozymes. 

The half life of RNA is usually much shorter than that of 
DNA. Usually-i the mRNA is degraded shortly after synthesisn to 
guarantee a very defined window of expression of a given gene- 
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This ^regulation is necessary to specifically maintain or change 
the set of proteins present at any time in a cell- Specific 
regions in the 3 n UTR (untranslated region) determine the 
stability of the mRNA in the cytoplasm before it is degraded by 
5 RNases-i enzymes consisting both of protein and RNA- 

References: Uatson and Crick (1T53) Nature 171: 737-736- 
Several categories of proteins are coded for by clones of 
the invention within the overall group of u Nucleic acid 
roanagement"and include-! among others-i the following: 

10 

Proteins induced bv DNA-Damaoe: There are several distinct 
pathways responsible for repair of DNA - Nucleotide excision 
repair is the most versatile DNA repair pathway and isthe main 
defense of mammalian cells against UV-induced DNA damage- Defects 

15 in proteins involved in this pathway can lead to inherited 

disorders (such as xeroderma pigmentosum 0P1IN *27a700-i *B7fl720i 
*27fl7MQ and *nMM0CU Cockayne's syndrome OHIN taibtOD and 
trichothiodystrophy 0I1IN #b01b75) . Study of UV-sensitive yeast 
RAD mutants has greatly aided this process and has revealed 

20 strong conservation of the components of nucleotide excision 

repair in eukaryotes- Clones in this category include: amyB_J>ln4 
and tes3_10ilb- 

Proteins involved in Loading of transf erRNAs : transfer RNAs 
must be coupled to an aminoacidi which then is transported to the 
25 peptideyl-transf erase centre of the ribosome. Clones in this 
category include: f br2_7flcl2 - 

Cvtosolic ribosomal proteins : Several proteins are part of 
the eukaryotic ribosomal peptidyl transferase center or modulate 
the activity of this centre- Such proteins can find application 
30 in modulation of ribosome assembly-! maintenance and activity. 
Clones in this category include: amy21il 

Histones : Histones are DNA-binding protein responsible not 
only for DNA structure and folding and packing-! but also are 
discussed to be involved in activation and silencing of large 
35 chromosomal regions- Clones in this category include: tes3_3!LalD - 

mRNA-bindina proteins: mRNA-binding are involved in 
regulation of mRNA folding-i translation and stability- For 
examplei the VILIP protein binds specifically to the 
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3'untranslated region of the neurotropin receptor mRNA- Clones in 
this group include amy2_2glE- 

Signal transduction 

Cells in higher order organisms need to continuously 
5 communicate with its environment especially with other cells of 
the same organism in order to maintain the function and 
specialization of the whole system these cells are part of. This 
important task of communication is performed with help of cell- 
surface receptors which receive and transmit signals from outside 
10 into the cell- 
G-protqjns 

The largest known family of cell-surface receptors is that of the 
G-protein-coupled receptorsi which mediate the transmission of 
diverse stimuli such as neurotransmitters-, glycopeptides-. 

15 hormonesi peptides-* odorant molecules-! and photons- The 

functional unit of these receptors is composed of the receptor 
molecule itself (GPCR) which is anchored in the cytoplasma 
membrane with seven membrane spanning domainsi the heterotrimer ic 
G-protein which is composed of and -subunits ( G and G ) -. 

20 and the effectors that interact with G and / or G - In 
particular-, the dissociated G and G can regulate the 
activities of a number of effector molecules such as adenylate 
. -cyclases-! phopholipase C isoformsi ion channels-, and tyrosine 
kinases-i resulting in a variety of cellular functions- The 

25 process of signal transduction must be tightly regulated and 

reversible in order to avoid overstimulation! to achieve signal 
termination -i and render the receptor responsive to subsequent 
stimuli Elacovelly L- et al--, Cl^T) FASEB J. 13-. Hamm-i H • E • 

(ina> J*. Biol. Chem- 273-. bb^-bTEU. 

30 G-proteins are GTPases that-, upon binding of GTP change 

their conformation which in return unmasks structural motives-i 
in particular the so called effector loop-, which can mediate the 
interactions to target proteins-, or effectors-, for the GTPases* 
This ability enables the GTPases to cycle between active-. GTP- 

35 bound and inactive-. GDP bound conformations and in the process to 
function as molecular traffic lights in a multitude of signal 
transduction pathways- The most important of these signal 
transduction pathways that are regulated with help of G-proteins 
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are that of the phospholipase C / protein kinase C and that of 
the adenylate cyclase / protein kinase A- 

The cycling of GTPases is tightly regulated by three main 
classes of proteins: The exchange of hydrolyzed GDP for a fresh 
5 GTP is facilitated by guanosine nucleotide exchange factors 
(GEFs)-i the hydrolysis of GTP to GDP is sped up by GTPase- 
activating proteins (GAPs)-» and the dissociation of GDP from the 
GTPases is inhibited by GDP dissociation inhibitors (GDIs) ETapon 
and Hall (m?) Curr.Opln* Cell. Biol. 9-, fib-TS-. Van Aelst and D- 
10 Souza-Schorey (1^7) Genes Dev. 11 •» 22T5-23223- 

SOC-familv 

A conserved motif that was originally identified in proteins 
that negatively regulate the signaling action of cytokines was 
termed SOCS boxi the Suppressor Of Cytokine Signaling- Based on 

15 homology-i five distinct structural protein classes have been 

identified since that carry this motif- The function of most of 
these proteins is presently not known- Common to the proteins is 
only the SOCS box which is located near the C-terminus of the 
respective peptides- Recently-i the SOCS box has been demonstrated 

20 to induce binding of proteins to elongins B and C which could 
target the proteins (and bound substrates) to the proteasomal 
protein degradation pathway (Kamura-» T- et al . (mfl) Genes Dev. 
12-, 3fi72-3fifiln Zhangi J--G- et al . (I'm) Proc . Natl. Acad. Sci . 
USA 96-, 2D71-2D7D- 

25 The class where the SOCS box was originally described 

contains several members (S0CS-1-S0CS-7 and CIS)- In addition to 
the SOCS boxi these proteins also contain a SH2 ( Src-homology 2) 
domain and a variable N-terminus- These SOCS proteins appear to 
form part of a classical negative feedback loop that regulates 

30 cytokine signal transduction- Upon cytokine stimulation-i 

expression of SOCS proteins is rapidly induced and the proteins 
inhibit further cytokine action- The mode of action of the SOCS 
proteins is variable- While S0CS-1 binds and inhibits the JAK 
(Janus kinases) family of cytoplasmic protein kinases ENarahzaki 

35 M. et al- CL^fi) Proc. Natl. Acad. Sci. USA 95-. 1313D-1313M -, 

Nicholson-. S.E- et al- Cl^D EMBO. <J. 18-, 375-3fi5Jn CIS appears 
to act by competing with signaling molecules such as the STATs 
(Transducers and Activators of Transcription) family for binding 
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to phosphorylated receptor cytoplasmic domains EYoshimura-i A- et 
al. (1^5) EMBO J. 14-. 2fllb-2fl2b =i llatsumoton A. et al - (1^7) 
Blood 89-. 314fi-3154:il. 

A second class of SOCS box protein contains additionally UD- 
5 40 repeats which were initially identified in the mouse WSB-l and 
-2 proteins- The functions of UD-4D proteins are not completely 
understood but seem to be rather divergent- In CdcMp the UD-40 
repeats probably are necessary for binding the substrate for 
Cdc34p EMathias-. N- et al . (I'm) Mol . Cell Biol. 19-, 175^-17t73. 

10 CdcMp is a component of a ubiquitin ligase that tethers the 
ubiqui tin-con jugating enzyme Cdc34p to its substrates. The 
posttranslat ional modification of a protein by ubiquitin usually 
results in rapid degradation of the ubiquitinated protein by the 
proteasome- The transfer of ubiquitin to substrate is a multistep 

15 process where UD-40 repeats might play an important function- 
Other WD-40 containing proteins (e-g- the retino blastoma 
binding protein RbApMfi) have been shown to bind metal ions (Zinc) 
and that this metal binding might mediate and/or regulate 
protein-protein interactions which are functionally important in 

20 chromatin metabolism IKenzior-. A-L- and Folk-. b)-R. (mfl) FEBS 

Lett. 440-. 425-42=0- These proteins are involved in the RAS-cANP 
pathway that regulates cellular growth CAch R-A. et al- (W7) 
Plant -Cell- 9 -» 155S-lbDfc»3- . . . 

The SPRY domain has been identified in pyrin or marenostrin-i 

25 a protein which is mutated in patients with Mediterranean fever 
and which is similar to the butyrophilin family- While 
butyrophilins seem to be involved in the lactation process in 
mammals-, the function pyrin is unknown- Three proteins (SSB-l to 
-3) have been identified to contain both SPRY and SOCS box 

30 motifs- The function of these proteins is also not known- 
Ankyrin repeat containing proteins share a 33-residue 
repeating motif-, an L-shaped structure with protruding -hairpin 
tips which mediate specific macromolecular interactions with 
cy toskeletal -> membrane-, and regulatory proteins- These proteins 

35 play fundamental roles in diverse biological activities including 
growth and development-, intracellular protein traf f icking-. the 
establishment and maintenance of cellular polarity-, cell adhesion 
signal transduction-, and- mRNA transcription- Three proteins that 
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contain ankyrin repeats (ASB-1 to -3) have been identified to 
contain a C-terminal SOCS box additionally to the ankyrin 
repeats- The function of these proteins or the individual domains 
remains to be discovered EHilton-i D-J- et al» (mfl) Proc. Natl. 
5 Acad. Sci. USA 95-, im-llTl- 

A few small GTPases (RAR and RAR like) do also contain a 
SOCS box. GTPases are involved in signal transduction during 
cellular communication- The function of the SOCS box in this type 
of proteins is currently unclear CHilton-» D-J. et al- (ITTfl) 
10 Proc. Natl. Acad. Sci. USA 95-. IM-lHl. 

Ca 5 + as second messenger 

The bivalent cation Ca 2+ isi besides cAMP-i one pf the two 

major second messengers in eukaryotic cells- Its intracellular 

concentration is tightly regulated and usually kept very low 

15 compared to the celTs environment- Ca 2+ binding proteins and 
transporters (Gap junction-i Voltage-gated-i second messenger- 
gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular 
stimuli- E.g. the contraction of the muscle is dependent on the 

20 presence of Ca 2+ ions which are readily transported back into the 
organelles in order for the muscle to relax. In signal 
transduction-* Ca 2+ functions as a second messenger that activates 
Ca e+ dependent processes through the activation of Ca 5+ /calmodulin 
dependent protein kinases (CaM kinases) which are the major 

25 effector molecules of Ca 2+ . In the signaling cascades-, the CaM 
dependent kinases activate phosphol ipases (e.g. phosphol ipase C) 
that in return activate other protein kinases such as protein 
kinase C- 

cAMP 

30 The cyclic AMP is produced by the enzyme adenylate cyclase 

in response to. extracellular signals. Certain G-proteins 
stimulate the activity of adenylate cyclase which converts ATP to 
cAMP and PPi. Two molecules of cAMP bind to each of two 
regulatory subunits of cAUP dependent protein kinase which in 

35 turn dissociate from the two catalytic subunits of the 
heterotetramer RsC2- Upon release of the C-subunitSn they become 
active and phosphorylate substrate proteins at Ser and Thr 
residues- The process leading from binding of extracellular 
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molecules to their receptors^ the transmission of the stimuli 
into the celli the activation of adenylate cyclase and the 
subsequent activation of cAMP dependent protein kinase is one of 
two major signal transduction pathways in eukaryotic cells* Since 
5 the phosphorylation of proteins is a post translational 
modification of proteins? the kinases are described in the class 
"signal transduction - " 

SARA 

Members of the transforming growth factor B (TGFfc) 

10 superf amily signal through a family of cell-surface transmembrane 
serine/threonine kinases? known as type I and type II receptors 
(Heldin et al-? m? ? Attisano and Urana? 111& ? Kretzschmar and 
J1assagu£? infl)- Ligand induces formation of heteromeric 
complexes of these receptors? and signaling is initiated when 

15 receptor- I is phosphorylated and activated by the constituti vely 
active kinase of receptor II (Ulrana-.et alo lim ). The activated 
type I receptor kinase then propagates the signal to a family of 
intracellular signaling mediators known as Smads (contraction of 
the C-elegans Sma and Drosophila Mad genes which were the first 

20 identified members of this class of signaling effectors) - 

Three classes of Smads with distinct functions have been 
defined: the receptor-regulated Smads-i which include Smadli Ei 3i 
5? and At the common mediator Smadi SmadM? and the antagonistic 
Sma'dSV which include Smadb and 7 (Heldin et al*r in7? Attisano 

25 and lilranai 111& ? Kretzschmar and Hassagu6? 111& ). Receptor- 
regulated ^mads (R-Smads) act as direct substrates of specific 
type I receptors-i and the proteins are phosphorylated on the last 
two serines at the carboxyl terminus within a highly conserved 
SSXS motif (liacias-Silva et al-? 111b ? Abdollah et al-? in? ? 

30 Kretzschmar et al.-» m? ? Liu et al-? 1117b ? Souchelny tsky i et 
al.-i 1117 )- Regulation of R-Smads by the receptor kinase 
provides an important level of specificity in this system- Thus-. 
SmadE and Smad3 are substrates of TGFfc or activin receptors and 
mediate signaling by these ligands (Macias-Silva et al-i 111b h 

35 Liu et al-? in7b ? Nakao et al*? in? )? whereas Smadl? 5? and A 
are targets of BMP receptors and propagate BMP signals (Hoodless 
et al-? 111b ? Chen et al-? m?b ? Kretzschmar et al-? in? ^ 
Nishimura et al-? 111B )• Once phosphorylated? R-Smads associate 
with the common- Smad? SmadM (Lagna- et al-? 111b ? Zhang et al • 
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m? )t and mediate nuclear translocation of the heteromeric 
complex- In the nucleus-, Smad complexes then activate specific 
genes through cooperative interactions with DNA and other DNA- 
binding proteins such as FASTln FASTEN and Fos/Jun (Chen et al*i 
5 -I Chen et al-i l^la =; Liu et al.-, m?a i Labbe et al*i 

ITTfi t Zhang et al--, l*Ha 1 Zhou et alo l^fl ). In contrast to 
R-Smads and Smad 1 !-, the antagonistic Smads-, Smadb and 7-, appear to 
function by blocking ligand-dependent signaling (reviewed in 
Heldin et al.*, 1^7 ). 
10 Phosphorylation of R-Smads by the type I receptor is 

essential for activating the TGFB signaling pathway (Heldin et 
al-i 1^7 ; Attisano and lilranai n^fl h Kretzschmar and Hassaguei 
mfi )• However-, little is known of how Smad interaction- with 
receptors is controlled. A novel SmadB/Smad3 interacting protein 
— 15 has been described (Tsukazaki T- et alo nifl > that contains a 
double zinc finger-, or FYVE domain! and which has been called 
SARA (Smad anchor for receptor activation). The SARA motif 
recruits Smad2 into distinct subcellular domains and co-localizes 
and interacts with TGFB receptors- TGFfS signaling induces 
20 dissociation of Smad2 from SARA with concomitant formation of 
Smad2/SmadM complexes and nuclear translocation- Moreover-, 
deletion of the FYVE domain in SARA causes mislocalization of 
SmadB and inhibits TGFfc-dependent transcriptional responses. 
Thus-, SARA defines a component of TGFfS signaling that functions 
25 to recruit SmadE to the receptor by controlling the subcellular 
localization of Smad- 

References: Abdollah et al. (1^7) J- Biol. Chem. 272-, 
B7b7fl-27baST Attisano et al- (l^a) Curr- Opin- Cell Biol- 10-. 
Ifla-nm Chen et al. (mL) Nature 363-, b^l-fc^H Chen et al. 
30 (lTTPa) Nature 35=1-, a5-a^ Chen et al- (IWb) Proc Natl. 

Acad- Sci- USA ^H-i 12 c 13a-15TM3=i Heldin et al. (m?) Nature 
3TD-, ML5-471=i Hoodless et al- (mt,) Cell 55-, 4a^-5DD; 
Kretzschmar et al. (ma) Curr- Opin- Genet. Dev. a-, 103-llln 
Kretzschmar et al- (m?) Genes Dev. 11-. TaH-'nSn Labbg et al. 
35 (ma) Mol- Cell 2-, lM-lBDi Lagna et al- (l^b) Nature 333-, 

a32-a3b : ; Liu et al- (IWa) Genes Dev- 3157-31b7=; Liu et 

al. (1^7b) Proc- Natl- Acad- Sci. USA 14-, 10bt c J-107bM h Nacias- 
Silva et al. (ITJb) Cell 37-, 1215-12BH=i Nakao et al- (1^7) 
ENBO J- Itn 5353-53L2n Nishimura et al- Cl^ia) J. Biol- Chem- 
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273-, 1672-167^! Souchelny t sky i et al- (m?) J. Biol. Chem. 
272-, 261D7-2flll5=. Tsukazaki et al- (mfl) Cell 77^-7^1=, 
Urana et al- (l^M) Nature 37D-, 341-3M7; Zhang et al- (in?) 
Curr. Biol- 7-, 270-27^ Zhang et al- (l^A) Nature 3^-, ^OT- 
5 ^ISn Zhou et al- (l^A) Hoi. Cell 2-, 121-127. 

Calcjufn 

The bivalent cation Ca 2+ is-, along with cAHP-i one of the two 
major second messengers in eukaryotic cells- Its intracellular 

10 concentration is tightly regulated and usually kept very low 
compared to the cellos environment. Ca 2 + binding proteins and 
transporters (Gap junction! Voltage-gated n second messenger- 
gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular 

15 stimuli- E.g. the contraction of the muscle -is dependent on the 
presence of Ca 2+ ions which are readily transported back into the 
organelles in order for the muscle to relax. In signal 
transduction-, Ca 2 + functions as a second messenger that activates 
Ca 2 + dependent processes through the activation of Ca 2 Vcalmodulin 

20 • dependent protein kinases (CaM kinases) which are the major- 
effector molecules of Ca 2+ . In the signaling cascades-, the CaM 
dependent kinases activate phosphol ipases (e-g- phosphol ipase C) 
that in return activate other protein kinases such as protein 
kinase C- 

25 Rab proteins 

In eukaryotic cells the compar tmental iza t ion of processes is 

a prerequisite for a tight regulation of processes and 

activities. The cells contain a highly dynamic set of membrane 

compartments that are responsible for packaging! sorting-, 

30 secretingi and recycling proteins and other molecules. 
Trafficking between organelles within the secretory pathway 
occurs as vesicles derived from a donor compartment fuse with 
specific acceptor membranes-, resulting in the directional 
transfer of cargo molecules. This process is tightly controlled 

35 by the Rab/Ypt family of proteins (reviewed by Novick and Zerial-, 
IT 11 !? ) a branch of the superfamily of small GTPases. Rab 
proteins regulate a variety of functions! including vesicle 
translocation and docking at specific fusion sites. Rabs may also 
play critical roles in higher order processes such as modulating 
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the levels of neurotransmitter release in neurons-i a likely 
mechanism in synaptic plasticity that underlies learning and 
memory (Geppert and SUdhofn mfi )- 

Small GTPases share a common three-dimensional fold thati in 
5 the GTP bound state-* can bind a variety of downstream effector 
proteins- GTP hydrolysis leads to a conformational change in the 
"switch" regions that renders the GTPase unrecognizable to its 
effectors- In this wayi by localizing and activating a select set 
of effectors! a common structural motif is used to control a wide 

10 array of distinct cellular processes- 

The final steps in membrane fusion are likely to be driven 
by a set of proteins known as SNAREs- After a vesicle becomes 
docked-i the cytoplasmic domains of VAMP (also termed 
synaptobrevin) and syntaxin on opposing membranesi in combination 

15 with a SNAP-ES moleculei coalesce into an elongated -helical 
bundle (Poirier et al.-i 1^6 t Sutton et al-i l^Tfi ) which may 
lead to fusion- Because numerous SNARE isoforms have been 
identified that localize to distinct membrane compartments-i it 
was originally proposed that the specificity of interaction 

20 between the SNARE proteins accounted for the specificity in, 
membrane trafficking- Recent results-i however-i suggest that 
SNAREs are not specific in their ability to form complexes in 
vitro-i suggesting that trafficking specificity requires 
additional factors (Yang et al-i ). In this regardi Rab 

25 proteins are strong candidates for governing the specificity of 
vesicle trafficking- Like the SNAREsi many isoforms (40) of the 
Rab family have been identified that localize to specific 
membrane compartments (reviewed by Novick and Zeriali m? )- 

Concomitant with the SNARE cyclen Rab proteins undergo a 

30 intricate cycle of membrane and protein interactions. Rabs are 
posttranslationally modified at Oterminal cysteines by the 
addition of. two geranylgeranyl groups-i which mediate membrane 
association when the Rab is in the GTP-bound state- After guanine 
nucleotide hydrolysis occursi the Rab is extracted from the 

35 membrane upon forming a complex with a cytosolic GDP-dissociat ion 
inhibitor (GDI)- This cytosolic intermediate is then recycled 
onto a newly forming vesicle-i most likely through a secondary 
factor termed a GDI dissociation factor (GDF)-> which displaces 
GDI- After the Rab becomes membrane boundn a guanidine nucleotide 
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exchange factor (GEF) promotes release of GDP and the subsequent 
loading of GTP- In its GTP-bound conformationn the Rab is then 
free to associate with its specific set of effectorsi which can 
in turn trigger events leading to the eventual fusion of the 
5 vesicle with a target membrane. To complete the cyclei perhaps 
after or concurrent with membrane fusioni a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysisn switching off 
the GTPa.se. The remaining GDP-bound Rab can then participate in a 
new round of fusion- 

10 Rab interactions with effectors are likely to regulate 

vesicle targeting and membrane fusion in three ways. Firstn a Rab 
may specifically facilitate vectorial vesicle transport. Vesicles 
are transported from their site of origin to acceptor 
compartments likely through associations with cytoskeletal 

15 elements and transport motors. A protein has been identified with 
a domain structure that suggests a connection between the 
cytoskeleton and the Rabs. This protein-i called Rabkinesin-b -> 
contains a kinesin-like ATPase motor domain followed by a coiled- 
coil stalk region and a RBD that specifically binds Rabk (Echard 

20 et al.-i mfi )- An additional link with the cytoskeleton is 
provided by the Rab effector-. Rabphilin-3A- Rabphilin-3A has been 
shown in vitro to interact with -actinin-i an actin-bundl ing 
proteini but only when not bound to Rab3A (Kato et al.-i ITTb ). 
These results raise the intriguing possibility that Rab proteins 

25 regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate 
destinations. 

Secondi Rab proteins may regulate membrane trafficking at 
the vesicle docking step- A number of Rab effectors-i including 

30 Rabaptin-5-i EEAl-i Rabphilin-3An and Rim-i may serve as molecular 
tethers. Each effector protein contains a RBD-* followed by a 
linker region (some having the potential to form elongated 
coiled-coil structures) i and a domain capable of interacting with 
a second Rab or the target membrane- Rabaptin-Sn for examples 

35 contains two RBDs-i one near the N terminus that specifically 
recognizes RabM and a second near the C terminus that binds RabS 
(Vitale et al.-. ITTfl ). Both Rim-, which is localized to the 
target membranei and Rabphilin-3An which is localized to the 
vesiclen contain N-terminal RBDs and C-terminal CaE+-binding C2 
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domains-! implicating these effectors in synaptic vesicle 
localization or docking in response to CaE+ influx (Wang et al-n 
m? )- Tethering effectors may also recognize protein complexes 
on the acceptor membrane- SecMp-. a yeast Rab3A homologi interacts 
5 with the exocyst (Guo et al.i 1W ) a complex of seven or more 
subunits that is assembled at sites of vesicle fusion along the 
plasma membrane- The exocyst complex may therefore function as a 
landmark for Rab/ef f ector-mediated vesicle docking. 

Thirdi once a vesicle has become tethered to its fusion 

10 site-i Rab proteins may selectively activate the SNARE fusion 
machinery. The mechanism of this activation is unknown but may 
involve direct interactions of Rabs or-, more likely-i their 
effectors with SNAREs - For example-. , Hrs-2 is a protein that binds 
to SNAP-25 and contains a Zn2+-finger motif characteristic of 

15 -Rab-binding proteins such as Rabphilin-3A-i Rim-i EEAIt and Noc2-. 
suggesting that Hrs-2 may form a physical link between Rabs and 
SNAREs (Bean et al-i In addition-, certain mutations in the 

syntaxin-binding protein Slylp-. the Seclp homolog utilized in ER 
to Golgi trafficking^ eliminate the requirement for Yptlp-i a Rab 

20 protein that functions at this trafficking step (Dascher et al-V 
lni )- Rabs may therefore regulate SNARE associations through 
Seel family members- In support of this idea-, a Rab effector was 
recently found to interact with a vacuole Rabi a Seclp homolog-, 
and a SNARE protein (Peterson et al-i im ) which suggests that 

25 this effector serves to connect Rab and SNARE function. In this 
way-i Rabs and their effectors may facilitate the correct pairing 
of SNAREs- 

References: Dascher et al- (l 1 ^!). Mol- Cell- Biol- 11-. 672- 
6flS=i Echard et al- (l^fl). Science- 27^-1 SaO-SflS^ Geppert et al- 

30 (IW). Annu- Rev- Neurosci - 51-. 75-^5; Guoet al - (MW. EI1B0 J. 
lfi-. 1071-10fl0 : i Kato et al - (^%). J- Biol. Chem- 271-, 31775- 
3177fin Novick et al. (in?)- Curr- Opin- Cell Biol- M^b-SOm 
Peterson et al- (l^T)- Curr- Biol- IS^-lbHn Poirier et al- 

(mfl). Nat. Struct. Biol- S-. 7b5-7b^=; Vitale et al- (mfl). EF1B0 

35 J. 17-, nm-lISi; Uang et al- (1^7). Nature- 3flfl-, 5^3-5^6^ Yang 
et al- (im). J - Biol- Chem- 5bin-5fc,53. 

Kinases 
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Reversible posttranslational modifications of proteins are 
major means of regulating cellular activities- Among the various 
modifications that are carried out by the cellsn the addition of 
phosphoryl groups to Ser/Thr or Tyr residues is the most 
5 important and widely used. The phosphorylation of proteins is 

accomplished by protein kinases-i while the reverse reactionn the 
removal of phosphoryl groupsi is carried out by phosphatases - 
Kinases / Phosphatases regulate key positions e-g- in the 
processes of cell proliferation! differentiation and 

10 communication/signaling- These processes must be tightly 

regulated in order to maintain a steady state level of cellular 
fate- Mis-regulation of kinase activities (or that of 
phosphatases) is made responsible for a multitude of disease 
processes such as oncogenesis! inflammatory processes! 

15 arteriosclerosis-! and psoriasis- — 

Protein kinases constitute the largest protein family that 
is currently known- Several hundred kinases have been identified 
already. Classically •» kinases are subdivided into two classes 
based on the amino acid residues in their substrates that are 

20 phosphorylated by the particular enzymes- The kinases 

specifically add phosphoryl groups from adenosine triphosphate 
(ATP) ori less frequently! guanosine triphosphate (GTP)i either 
to serine and/or threonine or to tyrosine residues of substrate 
proteins- An estimated InOOO to 1CN000 proteins present in a 

25 typical mammalian cell are believed to be regulated also by the 
action of protein kinases - 

Protein kinases are frequently integral parts of signaling 
cascades that transmit extracellular stimuli (e.g. hormonesi 
neurotransmitters-! growth- or differentiation factors) into the 

30 cell and result in various responses by the cells. The kinases 
play key roles in these cascades as they constitute a sort of 
"-molecular switches' 1 turning on or off the activities of other 
enzymes and proteins! e-g- metabolic! regulatory! channels and 
pumps! receptors! cytoskeletal ! transcription factors- 

35 The regulation of kinase activities is accomplished by 

various means: 

The best characterized example for the regulation via 
regulatory subunits is the cAMP-dependent protein kinase (PKA)' 
which is also a prototype for second messenger activated protein 
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kinases. This enzyme consists of a heterotetramer of two 
catalytic (C) and two regulatory (R) subunits. Upon binding of 
two molecules of second messenger (cAMP) in each R subunit-i the 
catalytic subunits are released and active- Both of the catalytic 
5 and the regulatory subunits several isoforms exist- The 

combination of catalytic and regulatory subunits determines the 
localization of the holoenzyme and also the substrate spectrum 
that is available for phosphorylation- The consensus pattern 
necessary to be present in the substrate for PKA action is RRXS/T 

10 where X can be any amino acid- 

The casein kinase II comprises another examples for 
holoenzymes that consist of catalytic and regulatory subunits - 
Other kinases that are activated by second messengers are cGMP- 
dependent protein kinase and Protein kinase C (PKC) which is 

15 activated by diacylglycerol •> which in-turn is produced by 
phospholipases by cleavage of phosphatidylcholine. 

Receptor kinases usually consists of an extracellular domain 
which can bind effector molecules (e.g. growth factors and 
hormones) and transfer the stimulus to the intracellular domain 

20 of these proteins which usually is a protein tyrosine kinase. 
Other tyrosine kinases lack an extracellular domain but are 
associated with receptors which transfer the signal after 
effector binding by activating the associated protein kinase 
enzyme (e-g- Src kinase familyV Srcn BlkS Fgr-i Fyn-i Lck Lyn-i Yes 

25 and Janus kinase family 1 ; Jakl-3-i Tyk5)- 

Dysfunction of kinases^ e.g. caused by non-functioning 
regulation-! can be the cause of inflammatory diseases and 
uncontrolled proliferation. v-Src which is a truncated version of 
the C-Src protooncogene tyrosine kinase is a classical example 

30 for this process as v-Src does not contain the regulatory domain 
of the cellular gene and is thus consti tuti vely active- 

Several categories of proteins are coded for by clones of 
the invention within the overall group of "Signal 
transduction"and include-i among others^ the following: 

35 

Discs-large family: In Drosophila more than 5D genes are 
discribed iin which mutation leads to loss of cell proliferation 
control indicating that they are tumor suppressor genes, (lost of 
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these genes have mammalian homologs- The Drosophila 'discs large r 
tumor suppressor protein-i Dlgn is the prototype of a family of 
proteins termed JIAGUKs (membrane-associated guanylate kinase 
homologs). HAGUKs are localized at the membrane-cytoskeleton 
5 interface-i usually at cell-cell junctions where they appear to 
have both structural and signaling roles- They contain several 
distinct domains-* including a modified guanylate kinase domain-, 
an SH3 motif i and 1 or 3 copies of the DHR (6LGF/PDZ) domain- 
Recessive lethal mutations in the 'discs large' tumor suppressor 

10 gene interfere with the formation of septate junctions (thought 
to be the arthropod equivalent of tight junctions) between 
epithelial cellsi and they also cause neoplastic overgrowth of 
imaginal discsn suggesting a role for cell junctions in 
proliferation control- These proteins can find application in 

15 modulating/blocking the -guanylate cyclase-pathway - Clones in 
this category include: amy2_12d7- 

Proteins with a tillil Domain: Proteins .that contain a WW 
domain which has been originally described as a short conserved 

20 region in a number of unrelated proteinsi among them dystrophin-* 
the gene responsible for Duchenne muscular dystrophy- The domain-i 
which spans about 35 residues! is repeated up to M times in some 
proteins- It has been shown to bind proteins with particular 
prbline-motif si"CAP1-P-P-CAP1-Yt and thus resembles somewhat SH3 

25 domains- This domain is frequently associated with other domains 
typical for proteins in signal transduction processes- Examples 
of proteins containing the UU domain are Dystrophin-! Utrophin-* 
vertebrate YAP protein (binds the SH3 domain of the Yes 
oncoprotein)-* murine NEDD-H (embryonic development and 

30 differentiation of the central nervous system) i IC3GAP (human 
GTPase activating protein acting on ras) . Therefore these 
proteins should be involved in intracellular signal transduction. 
Diseases associated (as potentially diagnostic therapeuticn 
causative-, and/or related-, etc.) with these proteins include as 

35 reported by OMIN 1) Muscular Dystrophy-i Pseudohypertrophic 

Progressive Duchenne and Becker Types (OMIN *31DEDD) - Clones in 
this category include: tes3_lld21 - 
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Ion- Transporters : For signalling stringent control od ion 
fluxes over biological membranes is of the essence- Several 
trans-membrane ion-chennel-proteins key elements of signal 
transduction pathways- Clones in this category include: amy2_10p? 
5 and amy2_2fia- 

RING-finqer proteins : A Zinc finger motif of the C3HC4 type 
(the so-called RING finger domain) is involved in mediating 
protein-protein interactions- Proteins containing a RING-finger 
are: mammalian V(D)J recombination activating protein (RAGl)-i 
10 mouse rpt-li human rfp-> human 52 Kd Ro/SS-A protein and others- 
The family of RING finger proteins contains a number of 
oncogenes- For example PML-i a probable transcription factor 
BRCAIt the mammalian cbl- and bmi-1 proto-oncogenes - Clones in 
this category include: amy2_li0hl7- 

15 Phosphatases : Proper targeting of PTPs is essential for many 

cellular signalling events including antigen induced 
proliferative responses of B and T cells- The physiological 
significance of PTPs is further unveiled through mice gene 
knockout studies and human genome sequencing and mapping 

20 projects- Several PTPs are shown to be critical in the 

pathogenesis of human diseasesi as shown by over e^D entries in 
0I1IN- Clones in this category include: tes3_31j20- 

Phosphoproteins : Some paraneoplastic syndromes affecting the 
nervous system are associated with antibodies that react with 

25 neuronal proteins and the causal tumor ( onconeuronal antigens). 
Several of these antibodies are markers of specific neurologic 
syndromes associated with distinct types of cancer- One of the 
antigenes recognised by such antibodies is Ma-li the neuron- and 
testis-specif ic protein 1- The expression of Hal mRNA is highly 

30 restricted to the brain and testis- Subsequent analysis suggested 
that Hal is likely to be a phosphoprotein (see OMIN ftbOtQlO). 
Clones in this category include: tes3_5kB2. 

Transmembrane proteins 

Membrane region prediction was effected using the AL0M2 
35 software (Klein et al-T ITASt version 2 by K - Nakai)- Similar to 
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many other methods-i the Kyte & Doolitle Cl^flc?) amino acid 
hydrophobicity scale is used in AL0N2 as the primary variable for 
classifying sequences in terms of their localization* High 
prediction accuracy is achieved through the system of intelligent 
5 decision rules and the utilization of a carefully selected 
training data set- The method also generates reliability 
estimates which makes it possible to distinguish between 
membrane-spanning proteins (It intrinsic) and globular proteins 
with regions of high hydrophobicity buried in the core. 

10 For a protein of length Li the block of length 2 with 

maximum hydrophobicity is found: 

max// = max(l//) s H { 

i=k 

where H± represents the hydrophobicity of an individual 
residue- 

15 Let PCI/maxH) and P(E/maxH) be the conditional probabilities 

that a protein is integral or peripheral respectively i given its 
value of maximal hydrophobicity maxH-i and let PCI) and P(E) be 
the prior probabilities of intrinsic and extrinsic membrane 
_pr_oteins_ estimated from the. training set- Then a sequence is 

20 assigned to E if 

PCE/maxH) > PCI/maxH) 

or-i after applying the Bayes rulen 

P(E)PCmaxH/E) > PCI)PCmaxH/I) 

where the conditional probabilities PCmaxH/E) and PCmaxH/I) 
25 can be determined based on the estimates of probability 
distributions of maxH in both groups. 

Discriminant analysis allows to simplify this task by 
calculating the odds PCE/MaxH) :P(I/maxH) as e*-. where h is the 
left-hand side of a linear or quadratic inequality. For examplei 
30 for the window of length 17i the protein is allocated to the 
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peripheral category E based on the empirically derived quadratic 
inequal i ty : 



1.0S(maxH) e +12.30maxH+17.H c l >D-» 

whereas the optimal inequality for assigning membrane 
5 proteins (category I) is linear: 

• OBmaxH + 1M-27 > □ 

The odds parameter can be made more or less stringent- For 
examplei one can require odds at least L:1D for a protein to be 
classified as integral- This leads to higher selectivity but less 
10 sensitivity. 

The boundaries of membrane-spanning regions in putative 
membrane proteins are detected by means of an iterative procedure 
whereby the most hydrophobic region corresponding to the value 
maxH is considered to be membrane and removed from the sequence- 
15 The classification procedure is then repeated again for the 

remaining sequence! andi if such a protein is again classified as 
integral-! the next most hydrophobic region is considered- 

Reference: Klein-* P--i Kanehisa-> II - -» DeLisii C- (nflS) The 
detection and classification of membrane-spanning proteins- 
20 Biochem Bxophys Acta. 815: MLfi-M7t 

Transcription factors 

Purified eukaryotic RNA polymerase II is unable to initiate 
promoter-specific transcription. A family of factors that 
collectively confer RNAPII promoter specificity is known as the 
25 general transcription factors (GTFs). They include the TATA- 
binding Protein CTBP) TFIIBi TFIIE-, TFIIF and TFI IH- These 
factors are conserved among all eukaryotes- 

RNAPII complexes containing the entire set of GTFs or a 
subset of GTFs together with other proteins have been isolated 
30 from mammalian and yeast cells- Although purified RNAPII and GTFs 
are sufficient for promoter-specific initiation-! this system 
fails to respond to activators- This is mediated by a further 
complex termed mediator complex which associates with the 
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carboxy-terminal heptapeptide domain (CTD) of the largest subunit 
of RNAPII. 



Purification of human RNAPII complexes resulted in two 
distinct forms of human RNAPII after analysis of functional 
5 properties- One complex contained chromatin remodeling activities 
but was devoid of GIFs- The other complex did not contain factors 
that modify chromatin but contained a subset of SRB/mediator 
subunits and GTFs and other polypeptides that mediate 
transcriptional activationi a scenario similar to that reported 
10 for yeast- 

A complex designated NAT (-20 SU) for negative regulator of 
transcription contains RNAPIIt Cdkfi-i homologs of the yeast 
mediator complex as well as Rgrl and SrblO/11 known as negative 
regulators of transcription - 

15 A complex with striking similar structural and functional 

properties to NAT has been identified designated SMCC (-15 SU) 
(SRB/mediator coactivator complex) -i that can also mediate 
transcriptional activation - 

The SMCC complex includes all reported NAT subunits 
20 including subunits of the TRAP complex- TRAP is a coactivator 

_ complex isolated on the basis of its interaction with the thyroid 
hormone receptor- Another coactivator complex PRIPt isolated on 
the basis of its ability to interact with the vitamin D3 
receptor-i contains novel subunits as well as subunits of NAT/SMCC 
25 and TRAP complexes- 

The effects of each of these coactivator complexes is 
dependent on the TFIID complex- It is not known if the T AF 
subunits of TFIID are required. It is likely that new 
coactivator complexes will be uncovered containing both novel and 
30 previously defined components. 

Beside the huge amount of transcription factors which can be 
part of the RNAIIP holoenzyme or the coactivator complexes there 
is an even larger quantity of specific transcription factors 
binding to promoter elements within the DNA sequences of a given 
35 gene leading to activation or repression of transcription- A 
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broad range of cellular responses like dif f erentiationi 
prolif eration-i cell death and others are elicited through 
activating or repressing the transcription of target genes- 
There are at least five superclasses of transcription 
5 factors: 

1. Superclass contains members with characteristic basic 
domains s 

Members ares 

Leucine zipper factorsi where the basic domain is followed 
10 by a leucine zipper of repeated leucine residues at every seventh 
position- The zipper mediates protein dimerization as a 
prerequisite for DNA-binding . 

Helix-loop-helix factors (bHLH) contain a DNA-binding basic 
region followed by a motif of two potential amphipathic alpha- 
15 helices connected by a loop of variable length also mediating 
dimerization - 

Factors with a combination of Helix-loop-helix and leucine 
zipper • 

Further members of this superclass are NF-l-i RF-X-i and bHSH 
20 like proteins. 

g. Superclass comprises factors containing zinc-coordinating 
DNA-bipcUng domains- 

Members ares 

Proteins with CysM zinc finger of nuclear receptor type-i 
25 where two such motifs differing in sizei composition and function 
are present in each receptor molecule- Each finger comprises H 
cysteine residues coordinating one zinc ion- The second half 
including the second cysteine pair has alpha-helix conformation 
and the helix of the first finger binds to the DNA through the 
30 major groove- The sequence between the first two cysteines of the 
second finger mediates dimerization upon DNA-binding- This class 
includes the steroid hormone receptors and the thyroid hormone 
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receptoi — like factors. Other diverse cys l 4 zinc fingers have a 
motif of GATA-type- 

Proteins with Cys2His2 zinc finger domain(s). Each finger 
comprises 2 cysteine and 2 histidine residues coordinating one 
5 zinc ioni and in some cases one histidine is replaced by another 
cysteine. The zinc ion is essential for DNA-binding - 

Proteins with Cyst cysteine-zinc cluster (s). Six cysteine 
residues coordinate two zinc ions-i i- e- two of the thiol groups 
are coordinating two zinc ions each- Present in many fungal 
10 regulators. 

Zinc fingers of alternating composition. 

3- Superclass contains factors of helix-t urn-helix type. 
Members are: 

Proteins with homeo domains- Homeo domains are three 
15 consecutive alpha-helix structures- Helix 3 contacts mainly the 
major groove of the DNAt some contacts at the minor groove are 
observed as well- Helix 2 and 3 resemble the helix-turn-helix 
structure of prokaryotic regulators. 

ProtjBins with Paired box domain(s). This is a DNA-binding 

20 domain of approximately 130 amino acid residues. Its N-terminal 
half is basic-i its C-terminal half is highly charged in general. 
It probably comprises 3 alpha-helices- 

Proteins with Fork head / winged helix domain(s). This 
domain was identified by homology between HNF-3A and fkh- The 
25 domain comprises approx- HQ AA. Analysis of the crystal 

structure has revealed a compact structure of three alpha- 
helicesn the third alpha-helix being exposed towards the major 
groove of the DNA- The domain also exerts minor groove contacts. 
Upon binding to DNA-i it induces a bend of 13 degree- 

30 Heat shock factors 

Proteins with Tryptophan clusters. The tryptophan clusters 
comprise several tryptophan residues with a spacing of 12-21 
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amino acid residuesi the subclass of myb-type DNA-binding domains 
typically exhibit a spacing of 1^-21 amino acid residues- 
Proteins with TEA domain(s) • The TEA domain has been 
identified as a region which is conserved among the transcription 
5 factors TEF-li TEC1 and abaA- This domain in TEF-1 has been shown 
to interact with DNAi although two additional regions may also 
contribute to DNA-binding- It is predicted to fold into three 
alpha-helices n with a randomly coiled region of lb-ia amino acid 
residues between helices 1 and E-i and a short stretch between 
10 helices E and 3 of 3-fl residues. 

M - Superclass contains beta-Scaffold Factors with Minor 
Groove Contacts 

Members are: 

Proteins with RHR (Rel homology) region- 

The structure of the Rel-type DBD exhibits a bipartite 
subdomain structure-, each subdomain comprising a beta-barrel with 
five loops that form an extensive contact surface to the major 
groove of the DNA - Particularly i the first loop of the N-terminal 
subdomain (the highly conserved recognition loop) performs 
contacts with the recognition element on the DNAi but other loops 
are involved- The fact that the main DNA-contacts are made 
through loops has been suggested to provide a high degree of 
flexibility in binding to a range of different target sequences- 
Augmenting interactions are achieved by two alpha-helices within 
the N-terminal Part that form strong minor groove contacts to the 
A/T-rich center of the B-element- In pbS-i the sequence between 
both alpha-helices is much shorter and even helix E is truncated. 
The secondi C-terminal domain is necessary mainly for protein 
dimerization . 

30 pS3 proteins 

MADS (MCMl-agamous-def iciens-SRF) box proteins- Proteins of 
this class comprise a region of homology. The DNA-binding domain 
also comprises the dimerization capability- In the DNA-bound 
dimer (shown for SRF)i two antiparallel amphipathic alpha-helices 
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(alpha-I)-i form a coiled coil and are oriented approximately 
parallel on the minor groove- These helices make minor and major 
groove contacts-i the N-terminal extensions form minor groove 
contacts- The bound DN A is bent and wrapped around the protein- 
5 It exhibits a compressed minor groove in the center and widened 
minor groove in the flanks- 
Beta-Barrel alpha-helix transcription factors- 
TATA-binding proteins 
HMG proteins 

10 Proteins of this class comprise a region of homology with 

the chromosomal non-histone HMG proteins such as HflGl - This 
region comprises the DNA-binding domain which in some instances 
such as HMG1 mediates sequence-unspecif ici in other cases such 
LEF-1 sequence-specific binding to DNA - This domain exhibits a 

15 typical L-shaped conformation made up of 3 alpha-helices and an 
extended N-terminal extension of the first helix- The latter 
together with helix l-i which contains a kink-i form the long arm 
of the Li whereas helices 1 and 2 form the short arm- Binding to 
the minor groove induces a sharp bending of the DNA by more than 

20 TO degree-i away from the bound protein- The overall topology of 

the DNA-protein complexes resembles somewhat that of the TBP-TATA 
box complex. 

Heteromeric CCAAT factors 

Proteins with Grainyhead domain(s) 

25 Cold-shock domain factors- Cold-shock domain proteins are 

characterized by a highly conserved region first found in 
prokaryotic cold-shock proteins- This domain is a single-stranded 
nucleic acid-binding structure interacting with DNA or RNA - It 
consists of an antiparallel five-stranded beta-barreli the 

30 strands of which are connected by turns and loops- Within this 

structure-i a three-stranded beta-strand contains a conserved RNA- 
binding motifi RNP1- Not all CSD proteins are transcription 
factors- Those which specifically bind to a certain sequence are 
termed Y-box proteins- Proteins of this class were previously 
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called protamine-1 ike domain proteins because of having a highly 
positively charged domain with interspersed proline residues- 



Proteins with Runt homology domain 

The members of this transcription factor class have been 
5 identified on the basis of their homology to a defined region 
within the Drosophilia protein Runt- The runt domain is part of 
the DNA-binding domain of these factors. It consists mainly of 
beta-strandsi does not contain alpha-helical regions and seems to 
be most similar to the palm domain found in DNA polymerase beta 
10 (rat)- 

5- Superclass contains other transcription factors like 
Copper fist proteins-. HriGI(Y)n STATi Pocket domain proteins and 
Apg/EREBP-related factors- 

The classification of transcription factors originates from 
15 TRANSFAC database: 

https //t ransfac-gbf.de/TRANSF AC/ 

Reference s Heinemeyer 

Several categories of proteins are coded for by clones of 
the invention within the overall group of "Transcription Factors 11 
20 and include-i among othersi the following* 

Home ob ox-proteins s Homeodomain-containing transcription 
factors are essential for a variety of processes in vertebrate 
development i including organogenesis. They have been shown to 

25 regulate cell prolif eration-. pattern segmental identity 

anddetermine cell fate decisions during embryogenesis . For 
example-! In zebrafish emx2 mRNAs are found in the dorsal 
telencephalon-1 parts of the diencephalon and the otocyst- The 
human homologue EmxS appears to be already expressed in fi-5 day 

30 embryos. It is also expressed in the presumptive cerebral cortex-i 
olfactory bulbs-i in some neuroectodermal areas in embryonic head 
including olfactory placodes in earlier stages and olfactory 
epithelia later in development. Mutants of the D • melahogaster 
gene "mempty spiracles" display spiracles devoid of filzkorper-i 
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no antenna and an open head- Clones in this category include: 
amy2_mmlLi • 

Proteins with myc-tvpg., helix-loop-helix dimerization domain 
signature (s) - This helix-loop-helix domain mediates protein 
5 dimerization has been found in various multimeric transcrpi t ion 
factors. Clones in this category include: tes3_lflnlM- 

Transcriptional silencers? In addition to transcription 
factorsr other proteinsi such as Y3)L153c of Saccharomyces 
cerevisla are responsible for silencing of genes. Clones in this 
10 category include: amyE_2fE5* 

Proteins regulating transcription factors: The activity of 
several transcription factor is regulated by the binding or 
dissociation of other proteins or by phosphorylation or • 
dephosphorylation of the transcription factor- For example-il- 
15 kappa-B-related protein interacts with the transcription factor 
NF-kB* I-kappa-B-alpha mutations contribute to constitutive NF- 
kappaB activity in cultured and primary HRS (Hodgkin/Reed- 
Sternberg) cells and are therefore involved in the pathogenesis 
of Hodgkin r s disease (HD) patients* Clones in this category 
20 include: amyE_lclE* 

Signal transducing proteins : Beta-transducin subunits of 
proteins contain UD-MD repeats* The beta subunits seem to be 
required for the replacement of GDP by GTP as well as for 
membrane anchoring and receptor recognition. Due to the zinc 
25 finger the novel protein seems to be a new molecule involved in 
signal transduction and transcription- These proteins have been 
reported by 0I1IN to be associated (as potentially diagnostic 
therapeutic-i causative-, and/or relatedi etc.) with the following 
diseases: 1) essential hypertension (OMIN *13T130). Clones in 
30 this category include: tes3_llcEE • 

* * * 

The inventioni therefore! specifically contemplates the 
following assemblages of materials-! which track the above- 
identified fourteen functional groupings-, that are useful in 
35 practicing the profiling aspects of the invention- One type of 
assemblage is nucleic acid-based and can include the following 
groupings of sequences and their derivatives: all sequences; 
human fetal brain sequences; brain derived sequences; human fetal 
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kidney library sequences; kidney derived sequences; human mammary 
carcinoma library sequences^ mammary carcinoma derived sequences; 
human testis library sequences; testes derived sequences; cell 
cycle genes; cell structure and motility genes; differentiation 
5 and development genes; intracellular transport and trafficking 
genes; metabolism genes; nucleic acid management genes; signal 
transduction genes; transmembrane protein genes; and 
transcription factor genes- Other assemblages contain proteins 
or their corresponding antibodies or antibody fragments-! divided 
10 along the same groupings- 
Database Applications 

Because they are human genes and gene products^ the 
inventive molecules are useful as members of a database. Such a 
database may be usedi for example-i in drug discovery and 

15 rationale drug design or in testing the novelty and non- 
obviousness of newly sequenced materials- In additions they are 
particularly suited in designing variants for the profiling (and 
other) applications described herein. Hencei the following 
discussion of electronic embodiments applies equally to such 

20 variants! which-, naturally-, will be generated and stored using a 
computer using known methodologies - 

Accordingly! one aspect of the invention contemplates a 
database of at least one of the inventive sequences stored on 
computer readable media. Again-i the individual sequences may be 

25 grouped with regard to the individual functional and structural 
groups mentioned above. While the individual sequences of a 
database may exist in printed form-, they are preferably in 
electronic form-i as in an ascii or a text file. They may also 
exist as word processing files or they may be stored in database 

30 applications like DBS-, Sybase-. Oracle-, GCG and GenBank. One 
skilled in the art will understand the range of applications 
suitable for using and storing the electronic embodiments of the 
invention. 

"Computer readable media 11 refers to any medium which can be 
35 read and accessed by a computer. These include: magnetic storage 
media-i like floppy discs-i hard drives and magnetic tape; optical 
storage media-, like CD-ROM; electrical storage media-, like RAM 
and ROM; and hybrids of these categories-, like magnetic/optical 
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storage media- One skilled in the art will readily understand 
the scope of computer readable media and how to implement them- 



Biological Activities and Assays for Implementing Therapeutic and 
Diagnostic Applications 

5 This section provides assays for biological activity that 

are useful in characterizing and quantifying the biological 

activity of the inventive molecules and their deri vati ves-i which 

is relevant to the pharmacological effects of the inventive 

molecules- As used in this section-, it will be understood that 

10 "protein" may also refer to the inventive antibodies (including 

fragments) • 

Cytokine and Cell Proliferation/Differentiation Activity 

A protein of the present invention may exhibit cytokines 
cell proliferation (either inducing or inhibiting) or cell 

15 differentiation (either inducing or inhibiting) activity or may 
induce production of other cytokines in certain cell populations- 
Many protein factors discovered to date-i including all known 
cytokinesn have exhibited activity in one or more factor 
dependent cell proliferation assays-* and hence the assays serve 

20 as a convenient confirmation of cytokine activity- The activity 
of a protein of the present invention is evidenced by any one of 
a number of routine factor dependent cell proliferation assays 
for cell lines including«i without limitation^ 35D-i DA2-i DAIGt 
T10-, BT-i BT/ll-. BaF3-. IIC^/G-. M + (preB M + ) EEfl-. RBS i DA1-, 1E3i 

25 TllbS-. HTE-i CTLLBi TF-li Mo7e and CMK . 

The activity of a protein of the invention mayi among other 
means-» be measured by the following methods: 

Assays for T-cell or thymocyte proliferation include without 
limitation those described in: Current Protocols in Immunology -i. 

30 Ed by J- E. Coligann A- fl- Kruisbeeki D • H- Flarguliesn E- PI. 

Shevachn Id - Strober-i Pub- Greene Publishing Associates and Wiley- 
Interscience (Chapter 3i In Vitro assays for Mouse Lymphocyte 
Function B.l-B.l^ Chapter 7i Immunologic studies in Humans)n 
Takai et al.-i d- Immunol- 137 : 3MTM-350D-, nfibn Bertagnolli et 

35 al-n J- Immunol. IMS : 17Db-1712 1 mD=i Bertagnolli et alo 

Cellular Immunology 133 : 3E7-3m-, llll; Bertagnollin et al.i I- 
Immunol- IMT :377fi-37fl3i Bowman et al--» I - Immunol - 

15E:175fc,-17bl-. ITIM - 
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Assays for cytokine production and/or proliferation of 
spleen cellsi lymph node cells or thymocytes include^ without 
limitation those described in: Polyclonal T cell stimulat ion-> 
Kruisbeeki A. M- and Shevach-i E- M- In Current Protocols in 
5 Immunology- J • E- e-a. Coligan eds. Vol 1 pp. 3 - 12 . 1-3 . 12 • 1M i 

John Uiley and Sonsi Toronto- mm and Measurement of mouse and 
human interleukin gamma -i Schreibern R. D- In Current Protocols 
in Immunology. J- E - e-a- Coligan eds. Vol 1 pp- b-S-l-b-fl-B! 
John Uiley and Sonsi Toronto- 
10 Assays for proliferation and differentiation of 

hematopoietic and lymphopoietic cells include! without 
limitation! those described in: Measurement of Human and Murine 
Interleukin 2 and Interleukin Bottomly! K-t Davisi L- S. and 
Lipsky-i P- E- In Current Protocols in Immunology. J- E- e-a- 

15 Coligan eds- Vol 1 pp. b . 3 . 1-b . 3 . 12 -i John Uiley and Sons-. 

Toronto- 1^1; deVries et al-i J. Exp. Med- 173:1205-1211-. ITIlV 
Moreau et al-i Nature 33b : b^D-b^! nfifii Greenberger et al*i 
Proc Natl- Acad- Sci - U-S-A. fiD :2 e 131-2T3a-i na3; Measurement of 
mouse and human interleukin b-Nordan-i R- In Current Protocols in 

20 Immunology. J* E- e-a- Coligan eds. Vol 1 pp- b - b - 1-b - b - 5 n John 
Uiley and Sonsi Toronto, miv Smith et al»i Proc- Natl. Aced. 
Sci- U-S-A- A3 : IfiST-lfibln nSbn Measurement of human Interleukin 
11-Bennett-i F-i Giannotti-i J-! Clark! S- C. and Turneri K- J. In 
Current Protocols in Immunology. J. E- e-a- Coligan eds- Vol 1 

25 pp. b-15-1 John Uiley and Sonsn Toronto- nili Measurement of 
mouse and human Interleukin T-Ciar letta ! A - i Giannottii J.t 
Clarke S* C and Turneri K- J. In Current Protocols in 
Immunology* J- E- e-a- Coligan eds* Vol 1 pp. b-13-li John Uiley 
and Sonsi Toronto, mi. 

30 Assays for T-cell clone responses to antigens (which will 

identify! among others! proteins that affect APC-T cell 
interactions as well as direct T-cell effects by measuring 
proliferation and cytokine production) include-i without 
limitation! those described in: Current Protocols in Immunology! 

35 Ed by J- E- Coligan! A- M - Kruisbeeki D . H . Margulies-i E. M- 

Shevach-i U Strober-i Pub- Greene Publishing Associates and Uiley- 
Interscience (Chapter 3-i In Vitro assays for Mouse Lymphocyte 
Function! Chapter b! Cytokines and their cellular receptors! 
Chapter ?! Immunologic studies in Humans); Weinberger et al-n 
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Proc Natl- Acad- Sci- USA 77 : bCm-bCm-, m nfiOn Weinberger et al-i 

Eur- J. Immun- ll^DS-min MflU Takai et al-i J- Immunol- 

137:3M c iM-3SDD-. nflbi Takai et al--. J- Immunol- mD:SDfl-S12-« Mflfl. 

Immune Stimulating or S uppressing Activity 
5 A protein of the present invention may also exhibit immune 

stimulating or immune suppressing activity! including without 
limitation the activities for which assays are described herein. 
A protein may be useful in the treatment of various immune 
deficiencies and disorders (including severe combined 

10 immunodeficiency (SCID))n e-g-! in regulating (up or down) growth 
and proliferation of T and/or B lymphocytes! as well as effecting 
the cytolytic activity of NK cells and other cell populations. 
These immune deficiencies may be genetic or be caused by vital 
(e-g-! HIV) as well as bacterial or fungal infections-i or may 

15 result from autoimmune disorders- More specifically-! infectious 
diseases causes by viral! bacterial! fungal or other infection 
may be treatable using a protein of the present invention! 
including infections by HIVt hepatitis virusesi herpesviruses! 
mycobacteria-! Leishmania spp.-i malaria spp- and various fungal 

20 infections such as candidiasis- Of course! in this regards a 

protein of the present invention may also be useful where a boost 
to the immune system generally may be desirable! i.e.! in the 
treatment of cancer- 
Autoimmune disorders which may be treated using a protein of 

25 the present invention include! for example! connective tissue 
disease! multiple sclerosis! systemic lupus erythematosus! 
rheumatoid arthritis! autoimmune pulmonary inflammation! 
Guillain-Barre syndrome! autoimmune thyroiditis! insulin 
dependent diabetes mellitis! myasthenia gravis! graf t-versus-host 

30 disease and autoimmune inflammatory eye disease- Such a protein 
of the present invention may also to be useful in the treatment 
of allergic reactions and conditions! such as asthma 
(particularly allergic asthma) or other respiratory problems- 
Other conditions! in which immune suppression is desired 

35 (including! for example! organ transplantation)! may also be 
treatable using a protein of the present invention- 

Using the proteins of the invention it may also be possible 
to modify immune responses! in a number of ways* Down regulation 
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may be in the form of inhibiting or blocking an immune response 
already in progress or may involve preventing the induction of an 
immune response- The functions of activated T cells may be 
inhibited by suppressing T cell responses or by inducing specific 
5 tolerance in T cells-i or both- Immunosuppression of T cell 

responses is generally an activei non-antigen-specif ic-i process 
which requires continuous exposure of the T cells to the 
suppressive agent- Tolerance-i which involves inducing non- 
responsiveness or anergy in T cellsn is distinguishable from 

10 immunosuppression in that it is generally antigen-specific and 
persists after exposure to the tolerizing agent has ceased- 
Operationally -i tolerance can be demonstrated by the lack of a T 
cell response upon reexposure to specific antigen in the absence 
of the tolerizing agent. 

15 Down regulating or preventing one or more antigen functions 

(including without limitation B lymphocyte antigen functions 
(such asi for examplei B7))t e-g-n preventing high level 
lymphokine synthesis by activated T cells-i will be useful in 
situations of tissue-* skin and organ transplantation and in 

20 graf t-versus-host disease (GVHD). For examplei blockage of T cell 
function should result in reduced tissue destruction in tissue 
transplantation- Typically-i in tissue transplants! rejection of 
the transplant is initiated through its recognition as foreign by 
T cellsi followed by" "an immune reaction that destroys the 

25 transplant- The administration of a molecule which inhibits or 
blocks interaction of a B7 lymphocyte antigen with its natural 
ligand(s) on immune cells (such as a solublei monomeric form of a 
peptide having B7-2 activity alone or in conjunction with a 
monomeric form of a peptide having an activity of another B 

30 lymphocyte antigen (e-g-i B7-ln B7-3) or blocking antibody) t 

prior to transplantation can lead to the binding of the molecule 
to the natural ligand(s) on the immune cells without transmitting 
the corresponding costimulatory signal- Blocking B lymphocyte 
antigen function in this matter prevents cytokine synthesis by 

35 immune cells-i such as T cells-i and thus acts as an 

immunosuppressant- Moreover! the lack of cost imulation may also 
be sufficient to anergize the T cellsi thereby inducing tolerance 
in a subject. Induction of long-term tolerance by B lymphocyte 
antigen-blocking reagents may avoid the necessity of repeated 
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administration of these blocking reagents- To achieve sufficient 
immunosuppression or tolerance in a subject-! it may also be 
necessary to block the function of a combination of B lymphocyte 
antigens - 

5 The efficacy of particular blocking reagents in preventing 

organ transplant rejection or GVHD can be assessed using animal 
models that are predictive of efficacy in humans- Examples of 
appropriate systems which can be used include allogeneic cardiac 
grafts in rats and xenogeneic pancreatic islet cell grafts in 

10 mice-i both of which have been used to examine the 

immunosuppressive effects of CTLAMIg fusion proteins in vivo as 
described in Lenschow et al-i Science 257:76^-7^2 (m2) and 
Turka et al-n Proc Natl. Acad- Sci USA-. fiT : lllOS-lllOS (1^2). 
In addition^ murine models of GVHD (see Paul ed*i Fundamental . 

-1-5 Immunology-* Raven Press-i New York-, nai-. pp- flMb-flM7) can be used 
to determine the effect of blocking B lymphocyte antigen function 
in vivo on the development of that disease- 
Blocking antigen function may also be therapeutically useful 
for treating autoimmune diseases. Many autoimmune disorders are 

20 the result of inappropriate activation of T cells that are 

reactive against self tissue and which promote the production of 
cytokines and autoantibodies involved in the pathology of the 
diseases. Preventing the activation of autoreactive T cells may 
reduce or eliminate disease symptoms- Administration of reagents 

25 which block cost imulat ion of T cells by disrupting 

receptor : ligand interactions of B lymphocyte antigens can be used 
to inhibit T cell activation and prevent production of 
autoantibodies or T cell-derived cytokines which may be involved 
in the disease process. Additional ly-i blocking reagents may 

30 induce antigen-specific tolerance of autoreactive T cells which 
could lead to long-term relief from the disease. The efficacy of 
blocking reagents in preventing or alleviating autoimmune 
disorders can be determined using a number of well-characterized 
animal models of human autoimmune diseases- Examples include 

35 murine experimental autoimmune encephali tisi systemic lupus 
erythmatosis in MRL/lpr/lpr mice or NZB hybrid mice-, murine 
autoimmune collagen arthritis-i diabetes mellitus in NOD mice and 
BB rats-i and murine experimental myasthenia gravis (see Paul ed-i 
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Fundamental Immunologyi Raven Pressi New Yorki ITfi^T pp. 6MD- 
65b) - 

Upregulation of an antigen function (preferably a B 
lymphocyte antigen function)! as a means of up regulating immune 
5 responses-* may also be useful in therapy- Upregulation of immune 
responses may be in the form of enhancing an existing immune 
response or eliciting an initial immune response- For example! 
enhancing an immune response through stimulating B lymphocyte 
antigen function may be useful in cases of viral infection- In 

10 addition! systemic viral diseases such as influenzal the common 
cold-i and encephalitis might be alleviated by the administration 
of stimulatory forms of B lymphocyte antigens systemically • 

Alternatively! anti-vital immune responses may be enhanced 
in an infected patient by removing T cells from the patient! 

15 costimulating the T cells in vitro with viral antigen-pulsed APCs 
either expressing a peptide of the present invention or together 
with a stimulatory form of a soluble peptide of the present 
invention and reintroducing the in vitro activated T cells into 
the patient- Another method of enhancing anti-viral immune 

20 responses would be to isolate infected cells from a patient-i 
transfect them with a nucleic acid encoding a protein of the 
present invention as described herein such that the cells express 
all or a portion of the protein on their surfacei and reintroduce 
the transfected cells into the patient- The infected cells would 

25 now be capable of delivering a costimulatory signal toi and 
thereby activate! T cells in vivo. 

In another application-! up regulation or enhancement of 
antigen function (preferably B lymphocyte antigen function) may 
be useful in the induction of tumor immunity- Tumor cells (e-g*i 

30 sarcoma! melanoma! lymphoma! leukemia! neuroblastoma! carcinoma) 
transfected with a nucleic acid encoding at least one peptide of 
the present invention can be administered to a subject to 
overcome tumor-specific tolerance in the subject. If desiredi the 
tumor cell can be transfected to express a combination of 

35 peptides. For example! tumor cells obtained from a patient can be 
transfected ex vivo with an expression vector directing the 
expression of a peptide having B7-E-like activity alone! or in 
conjunction with a peptide having B7-l-like activity and/or B7-3- 
like activity- The transfected tumor cells are returned to the 
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patient to result in expression of the peptides on the surface of 
the transfected cell- Alternatively-! gene therapy techniques can 
be used to target a tumor cell for transfection in vivo. 

The presence of the peptide of the present invention having 
5 the activity of a B lymphocyte antigen(s) on the surface of the 
tumor cell provides the necessary costimulat ion signal to T cells 
to induce a T cell mediated immune response against the 
transfected tumor cells- In addition-i tumor cells which lack MHC 
class I or MHC class II moleculesi or which fail to reexpress 

10 sufficient mounts of MHC class I or MHC class II molecules-i can 
be transfected with nucleic acid encoding all or a portion of 
(e-g-i a cytoplasmic-domain truncated portion) of an MHC class I 
alpha chain protein and beta E microglobulin protein or an MHC 
class II alpha chain protein and an MHC class II beta chain 

15 protein to thereby express MHC class I or HHC class II proteins 
on the cell surface- Expression of the appropriate class I or 
class II MHC in conjunction with a peptide having the activity of 
a B lymphocyte antigen (e-g--i B7-1-I B7-2-I B7-3) induces a T cell 
mediated immune response against the transfected tumor cell- 

20 Optionallyi a gene encoding an antisense construct which blocks 
expression of an MHC class II associated proteini such as the 
invariant chain-i can also be cotransf ected with a DNA encoding a 
peptide having the activity of a B lymphocyte antigen to promote 
presentation of tumor associated antigens and induce tumor 

25 specific immunity- Thusi the induction of a T cell mediated 

immune response in a human subject may be sufficient to overcome 
tumor-specific tolerance in the subject- 

The activity of a protein of the invention may-, among other 
meansi be measured by the following methods: 

30 Suitable assays for thymocyte or splenocyte cytotoxicity 

includen without limitation! those described in: Current 
Protocols in Immunology-! Ed by J- E- Coligani A- M- Kruisbeek-, D - 
H- Margulies-i E- M- Shevachn U- Strober-i Pub- Greene Publishing 
Associates and Uiley-Interscience (Chapter 3i In Vitro assays for 

35 Mouse Lymphocyte Function 3-1-3-lTn Chapter 7i Immunologic 

studies in Humans); Herrmann et al--i Proc- Natl- Acad- Sci - USA 
76 : BHfifl-BM^Sn llfil; Herrmann et al--, J. Immunol- ISA : nbfi-l^M -i 
Ma2* Handa. et al-n J- Immunol- 135 : 15b4-157S-, nfl5; Takai et 
al--, I- Immunol- 137:34^4-3500-, nflfe,; Takai et al--, J- Immunol- 
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mQ:5Dfl-512-* nflfin Herrmann et al--. Proc- Natl- Acad- Sci- USA 
76:2466-24^2-. ITfiln Herrmann et al--» J- Immunol. 126 : nb6-n74 -. 
1^62=. Handa et alo J- Immunol- 135 : 15b4-1572 •* n65=» Takai et 
al--* J- Immunol. 13? : 34^4-3500 •» ITfibn Bowmanet al--* J • Virology 
5 tlsmS-niBi Takai et al--. J- Immunol. 140:506-512-* nfifln 

Bertagnolli et alo Cellular Immunology 133:327-341-* min Brown 
et al--. J- Immunol. 153:307^-30^2-. 1^4- 

Assays for T-cel 1-dependent immunoglobulin responses and 
isotype switching (which will identify-* among others-i proteins 

10 that modulate T-cell dependent antibody responses and that affect 
Thl/Th2 profiles) include-i without limitation-i those described 
in: Maliszewski -. J- Immunol- 144 :3026-3033.* ITTO; and Assays for 
B cell function: In vitro antibody production-! Mond-* J. J. and 
Brunswicki M- In Current Protocols in Immunology. J- E- e-a- 

15 Coligan eds. Vol 1-pp. 3.6.1-3.6.1b-» John Wiley and Sonsi 
Toronto. 1^4. 

Mixed lymphocyte reaction (MLR) assays (which will identify-* 
among others-* proteins that generate predominantly Thl and CTL 
responses) include-* without limitation-* those described in: 

20 Current Protocols in Immunology, Ed by J- E- Coligan-. A. M- 
Kruisbeek-* D- H. Margulies-. E- M- Shevach-. III- Strober-* Pub- 
Greene Publishing Associates and Ui ley-Interscience (Chapter 3-* 
In Vitro assays for Mouse Lymphocyte Function 3-1-3. IT, Chapter 
7-» Immunologic studies in Humans); Takai et al--» J- Immunol- 

25 137:3M c m-35D0-i nfibi Takai et al.-* J. Immunol. 140:506-512-. nfifl; 
Bertagnolli et al-i J - Immunol- 1MT : 3776-3763 -* 1^2. 

Dendritic cell-dependent assays (which will identify-* among 
others-* proteins expressed by dendritic cells that activate naive 
T-cells) include-* without limitation-i those described in: Guery 

30 et al--. J- Immunol. 134 : 53L-544 -» m5=» Inaba et al--. Journal of 
Experimental Medicine 173 : 54 c l-55T-* l^li Macatonia et al-i 
Journal of Immunology 154 : 5071-507T-* l^Si Porgador et al--. 
Journal of Experimental Medicine 162 : 255-2b0 -» 1^5=* Nair et al.-* 
Journal of Virology L7 :40b2-40L^-, 1^3 , Huang et al., Science 

35 2b4 ^bl-TbS-. ITTMi Macatonia et al--. Journal of Experimental 
Medicine IbT : 1255-12L.4 -. 1^6^ Bhardwaj et al-i Journal of 
Clinical Investigation ^4:7^7-607-1 l c n4=* and Inaba et al.-. 
Journal of Experimental Medicine 172 : b31-b4D •* 1^0- 
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Assays for lymphocyte survi val/apoptosis (which will 
identify! among others! proteins that prevent apoptosis after 
superantigen induction and proteins that regulate lymphocyte 
homeostasis) include! without limitations those described in: 
5 Darzynkiewicz et al-n Cytometry 13:7^5-605! 1^2=1 Gorczyca et 
al.i Leukemia 7:fc»51-b7D! 1113; Gorczyca et al-i Cancer Research 
53:l c m5-n51n Itoh et alo Cell bb:233-2M3i Wl* 

Zacharchuk-i Journal of Immunology 1M5 : M037-MD45! Zamai et 

al.-i Cytometry 14:6^1-617! 1^3; Gorczyca et al.i International 
10 Journal of Oncology IsbST-bMfl-i 1112- 

Assays for proteins that influence early steps of T-cell 
commitment and development include! without limitation! those 
described ins Antica et al-! Blood 64:111-117! 1114; Fine et al-! 
Cellular Immunology 155:111-122! 1114 ! Galy et al-! Blood 
15 55:2770-2775! 1115! Toki et al-! Proc Nat. Acad Sci- USA 
aa:754fl-755l! 11*11. 
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Hematopoiesis Regulating Activity 

A protein of the present invention may be useful in 
regulation of hematopoiesis andi consequently-* in the treatment 
of myeloid or lymphoid cell deficiencies. Even marginal 
5 biological activity in support of colony forming cells or of 

factor-dependent cell lines indicates involvement in regulating 
hematopoiesis-* e.g. in supporting the growth and proliferation of 
erythroid progenitor cells alone or in combination with other 
cytokines-! thereby indicating utility-i for example-, in treating 

10 various anemias or for use in conjunction with 

irradiation/chemotherapy to stimulate the production of erythroid 
precursors and/or erythroid cells=* in supporting the growth and 
proliferation of myeloid cells such as granulocytes and 
monocytes/macrophages (i.e.-. traditional CSF activity) useful-* 

15 for example-i in conjunction with chemotherapy to prevent or treat 
consequent myelo-suppressibn=* in supporting the growth and 
proliferation of megakaryocytes and consequently of platelets 
thereby allowing prevention or treatment of various platelet 
disorders such as thrombocytopenia-* and generally for use in 

20 place of or complimentary to platelet transfusions; and/or in 
supporting the growth and proliferation of hematopoietic stem 
cells which are capable of maturing to any and all of the above- 
mentioned hematopoietic cells and therefore find therapeutic 
utility in various stem cell disorders (such as those usually 

25 treated with transplantation-* including! without limitation-* 

aplastic anemia and paroxysmal nocturnal hemoglobinuria)-* as well 
as in repopulating the stem cell compartment post 
irradiation/chemotherapy i either in-vivo or ex-vivo ( i . e . -> in 
conjunction with bone marrow transplantation or with peripheral 

30 progenitor cell transplantation (homologous or heterologous)) as 
normal cells or genetically manipulated for gene therapy. 

The activity of a protein of the invention may-* among other 
means-* be measured by the following methods: 

Suitable assays for proliferation and differentiation of 

35 various hematopoietic lines are cited above. 

Assays for embryonic stem cell differentiation (which will 
identify-i among others-* proteins that influence embryonic 
differentiation hematopoiesis) include-, without limitation-* those 
described in= Johansson et al - Cellular Biology 15 : mi-lSln 
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Keller et al--. Molecular and Cellular Biology 1*^3 * 

McClanahan et al-i Blood fll : ETOS-E^lS-i m3. 

Assays for stem cell survival and differentiation (which 
will identify-, among others-, proteins that regulate lympho- 
5 hematopoiesis) include-, without limitation-, those described in: 
flethy lcellulose colony forming assays-. Freshneyn M. 6- In Culture 
of Hematopoietic Cells- R • I. Freshney-, et al- eds- Vol pp- 2L,5- 
2b6i Uiley-Lissi Inc-i New York-t N-Y- IWi Hirayama et al.i 
Proc Natl. Acad- Sci- USA fiT : 5^07-5=111-, 1^2^ Primitive 

10 hematopoietic colony forming cells with high proliferative 
potential-! NcNiece-, I- K - and Briddell-, R • A- In Culture of 
Hematopoietic Cells- R- I- Freshneyi et al- eds- Vol pp- 23-3T-, 
Uiley-Liss-. Inc--i New York-, N-Y- Neben et al.-, Experimental 

Hematology 22:353-35^-, Cobblestone area forming cell assays 

15 Ploemacher-, R. E • In Culture of Hematopoietic Cells- R - I- 

Freshney-i et al- eds- Vol pp. 1-21-, Uiley-Liss-, Inc--» New Yorki 
N-Y- Long term bone marrow cultures in the presence of 

stromal cells-i Spooncer-, E - -» Dexter-i M- and Allen-, T- In Culture 
of Hematopoietic Cells- R- I- Freshney-, et al- eds- Vol pp- lb3- 

20 17^1 Uiley-Liss-, Inc.-. New York-, N-Y- IWi Long term culture 
initiating cell assay-i Sutherland! H- J- In Culture of 
Hematopoietic Cells- R- I- Freshney-i et al- eds- Vol pp. 13T-1L2t 
Uiley-Liss-, Inc.-. New York-, N-Y- 1^4. 

Tissue Growth Activity 

25 A protein of the present invention also may have utility in 

compositions used for bonei cartilage-i tendon-, ligament and/or 
nerve tissue growth or regeneration-i as well as for wound healing 
and tissue repair and replacement-! and in the treatment of burns-, 
incisions and ulcers- 

30 A protein of the present invention-, which induces cartilage 

and/or bone growth in circumstances where bone is not normally 
formed-, has application in the healing of bone fractures and 
cartilage damage or defects in humans and other animals- Such a 
preparation employing a protein of the invention may have 

35 prophylactic use in closed as well as open fracture reduction and 
also in the improved fixation of artificial joints- De novo bone 
formation induced by an osteogenic agent contributes to the 
repair of congenital-, trauma induced-, or oncologic resection 
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induced craniofacial defectsi and also is useful in cosmetic 
plastic surgery- 

A protein of this invention may also be used in the 
treatment of periodontal disease-* and in other tooth repair 
5 processes- Such agents may provide an environment to attract 
bone-forming cells-i stimulate growth of bone-forming cells or 
induce differentiation of progenitors of bone-forming cells- A 
protein of the invention may also be useful in the treatment of 
osteoporosis or osteoarthr itis-i such as through stimulation of 

10 bone and/or cartilage repair or by blocking inflammation or 

processes of tissue destruction (collagenase activity-i osteoclast 
activity! etc*) mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be 
attributable to the protein of the present invention is 

15 tendon/ligament formation^- A protein of the present invention-* 
which induces tendon/ligament-like tissue or other tissue 
formation in circumstances where such tissue is not normally 
forrned-» has application in the healing of tendon or ligament 
tearsn deformities and other tendon or ligament defects in humans 

20 and other animals. Such a preparation employing a 

tendon/ligament-like tissue inducing protein may have 
prophylactic 'use in preventing damage to tendon or ligament 
tissuei as well as use in the improved fixation of tendon or 
ligament to bone or other tissues-i and in repairing defects to 

25 tendon or ligament tissue. De novo tendon/ligament-like tissue 
formation induced by a composition of the present invention 
contributes to the repair of congenital-! trauma inducedn or other 
tendon or ligament defects of other origin! and is also useful in 
cosmetic plastic surgery for attachment or repair of tendons or 

30 ligaments. The compositions of the present invention may provide 
environment to attract tendon- or ligament-forming cells-i 
stimulate growth of tendon- or ligament-forming cells-i induce 
differentiation of progenitors of tendon- or ligament-forming 
cells-* or induce growth of tendon/ligament cells or progenitors 

35 ex vivo for return in vivo to effect tissue repair- The 

compositions of the invention may also be useful in the treatment 
of tendonitis-! carpal tunnel syndrome and other tendon or 
ligament defects. The compositions may also include an 
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appropriate matrix and/or sequestering agent as a carrier as is 
well known in the art- 

The protein of the present invention may also be useful for 
proliferation of neural cells and for regeneration of nerve and 
5 brain tissuei i.e. for the treatment of central and peripheral 
nervous system diseases and neuropathies •» as well as mechanical 
and traumatic disorders! which involve degeneration t death or 
trauma to neural cells or nerve tissue- More specifically! a 
protein may be used in the treatment of diseases of the 

10 peripheral nervous system-i such as peripheral nerve injuries! 
peripheral neuropathy and localized neuropathies! and central 
nervous system diseases! such as Alzheimer's! Parkinson's 
disease! Huntington's disease! amyotrophic lateral sclerosis! and 
Shy-Drager syndrome- Further conditions which may be treated in 

15 accordance— with the present invention include mechanical and 

traumatic disorders-t such as spinal cord disorders! head trauma 
and cerebrovascular diseases such as stroke- Peripheral 
neuropathies resulting from chemotherapy or other medical 
therapies may also be treatable using a protein of the invention- 

20 Proteins of the invention may also be useful to promote 

better or faster closure of non-healing woundsi including without 
limitation pressure ulcers-, ulcers associated with vascular 
insufficiency! surgical and traumatic wounds! and the like- 

It is expected that a protein of the present invention may 

25 also exhibit activity for generation or regeneration of other 

tissues! such as organs (including! for example! pancreas^ liver! 
intestinen kidney! skin! endothelium)! muscle (smooth! skeletal 
or cardiac) and vascular (including vascular endothelium) tissue! 
or for promoting the growth of cells comprising such tissues- 

30 Part of the desired effects may be by inhibition or modulation of 
fibrotic scarring to allow normal tissue to regenerate. A protein 
of the invention may also exhibit angiogenic activity- 

A protein of the present invention may also be useful for 
gut protection or regeneration and treatment of lung or liver 

35 fibrosis! reperf usion in jury in various tissues! and conditions 
resulting from systemic cytokine damage- 

A protein of the present invention may also be useful for 
promoting or inhibiting differentiation of tissues described 
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above from precursor tissues or cellsi or for inhibiting the 
growth of tissues described above- 

The activity of a protein of the invention may-i among other 
meansi be measured by the following methods: 
5 Assays for tissue generation activity include! without 

limitation! those described in: International Patent Publication 
No. LJOTS/ltDBS (bone-i cartilage! tendon)^ International Patent 
Publication No- U0T5/D5B Mb (nervei neuronaDi International 
- Patent Publication No. IdO^l/D?*^! (skin! endothelium). 
10 Assays for wound healing activity include! without 

limitation! those described in: Winter! Epidermal Wound Healing! 
pps- 71-112 (Haibach-i H- I. and Rovee-i D- T-! eds.)i Year Book 
Medical Publishers! Inc.! Chicago! as modified by Eaglstein and 
riertzi J. Invest. Dermatol cn?fi). 

15 Activin/Inhibip Activity 

A protein of the present invention may also exhibit activin- 
or inhibin-related activities- Inhibins are characterized by 
their ability to inhibit the release of follicle stimulating 
hormone (FSH)! while activins and are characterized by their 

20 ability to stimulate the release of follicle stimulating hormone 
(FSH)- Thus-i a protein of the present invention! alone or in 
heterodimers with a member of the inhibin alpha family! may be 
useful as a contraceptive based on the ability of inhibins to 
decrease fertility in female mammals and decrease spermatogenesis 

25 in male mammals- Administration of sufficient amounts of other 
inhibins can induce infertility in these mammals. Alternatively! 
the protein of the invention! as a homodimer or as a heterodimer 
with other protein subunits of the inhibin- beta group! may be 
useful as a fertility inducing therapeutic! based upon the 

30 ability of activin molecules in stimulating FSH release from 

cells of the anterior pituitary. See! for example! U-S- Pat- No- 
4!7TfliaflS. A protein of the invention may also be useful for 
advancement of the onset of fertility in sexually immature 
mammals! so as to increase the lifetime reproductive performance 

35 of domestic animals such as cowsi sheep and pigs- 

The activity of a protein of the invention may! among other 
means! be measured by the following methods: 
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Assays for act i vin/inhibin activity include-* without 
limitations those described in: Vale et alo Endocrinology 
c Jl:5b2-572-, n72=, Ling et al-i Nature 321:77^-762-, l^flbn Vale et 
al--> Nature 321:77b-7?Sn n6b=i riason et al*i Nature 316:^5^-^3-1 
5 ITflSn Forage et al.-i Proc. Natl- Acad- Sci- USA 63 : SCHl-SO^S -, 

net- 

Chemotactic/Chemokinet ic Activity 

A protein of the present invention may have chemotactic or * 
chemokinetic activity (e-g-n act as a chemokine) for mammalian 

10 cells-i including-* for examplei monocytes-* fibroblasts-. 

neutrophils! T-cells-» mast cellsi eosinophils! epithelial and/or 
endothelial cells. Chemotactic and chemokinetic proteins can be 
used to mobilize or attract a desired cell population to a 
desired site of action- Chemotactic or chemokinetic proteins 

15 provide particular advantages in treatment of wounds and other 
trauma to tissues-, as well as in treatment of localized 
infections. For example-i attraction of lymphocytes-, monocytes or 
neutrophils to tumors or sites of infection may result in 
improved immune responses against the tumor or infecting agent- 

20 A protein or peptide has chemotactic activity for a 

particular cell population if it can stimulate-, directly or 
indirectly-, the directed orientation or movement of such cell 
population- Preferably-, the protein or peptide has the ability to 
directly stimulate directed movement of cells- Whether a 

25 particular protein has chemotactic activity for a population of 
cells can be readily determined by employing such protein or 
peptide in any known assay for cell chemotaxis. 

The activity of a protein of the invention may-i among other 
means-, be measured by the following methods: 

30 Assays for chemotactic activity (which will identify 

proteins that induce or prevent chemotaxis) consist of assays 
that measure the ability of a protein to induce the migration of 
cells across a membrane as well as the ability of a protein to 
induce the adhesion of one cell population to another cell 

35 population. Suitable assays for movement and adhesion include-, 
without limitation-, those described in: Current Protocols in 
Immunologyn Ed by J- E - Coligan-i A. H- Kruisbeek-, D - H- 
Marguiles-. E - M- Shevach-i U- Strober-i Pub. Greene Publishing 
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Associates and Ui ley-Interscience (Chapter b-12-, Pleasurement of 
alpha and beta Chemokines b - 12 . 1-b - IS • 56 *•» Taub et al- J • Clin. 
Invest- ^5:13?0-137b-. 1^5=, Lind et al - APHIS IDSsHD-Hti 1^51 
riuller et al Eur- J • Immunol- 25 : 17MM-17MA i Gruber et al- J - of 
5 Immunol. 152 : 5flbD-5flb?-i mm Johnston et al- J- of Immunol- 
153:17b2-17bfi-. l^TH- 

Hemostatic and Thrombolytic Activity 

A protein of the invention may also exhibit hemostatic or 
thrombolytic activity. As a resulti such a protein is expected to 

10 be useful in treatment of various coagulation disorders 

(including hereditary disorders-i such as hemophilias) or to 
enhance coagulation and other hemostatic events in treating 
wounds resulting from traumai surgery or other causes- A protein 
of the invention may also be useful for dissolving or inhibiting 

15 formation of thromboses and for treatment and prevention of 

conditions resulting therefrom (such asi for example-i infarction 
of cardiac and central nervous system vessels (e-g--» stroke) - 

The activity of a protein of the invention may-i among other 
means-i be measured by the following methods: 

20 Assay for hemostatic and thrombolytic activity include-. 

without limitation-i those described in: Linet et al--> J- Clin- 
Pharmacol- 2b :131-mD v n&b; Burdick et al-i Thrombosis Res- 
. MSrma-MlTn l^fl? ^Humphrey et al.-i Fibrinolysis 5:71-7=1 
Schaub-! Prostaglandins 35 : l 4b7-H7M -» ITfifi. 

25 Receptor/Ligantj Activity 

A protein of the present invention may also demonstrate 
activity as receptors-i receptor ligands or inhibitors or agohists 
of receptor/ligand interactions- Examples of such receptors and 
ligands include-! without limitation-! cytokine receptors and their 

30 ligandsn receptor kinases and their ligandsn receptor 

phosphatases and their ligandsn receptors involved in cell-cell 
interactions and their ligands (including without limitation-! 
cellular adhesion molecules (such as selectins-i integrins and 
their ligands) and receptor/ligand pairs involved in antigen 

35 presentation-! antigen recognition and development of cellular and 
humoral immune responses)- Receptors and ligands are also useful 
for screening of potential peptide or small molecule inhibitors 
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of the relevant receptor/l i gand interaction- A protein of the 
present invention ( including n without limitation! fragments of 
receptors and ligands) may themselves be useful as inhibitors of 
receptor/ligand interactions- 
5 The activity of a protein of the invention may-» among other 

means-i be measured by the following methods: 

Suitable assays for receptor-ligand activity include without 
limitation those described in=Current Protocols in Immunology^ Ed 
by J • E- Coligan-i A- M- Kruisbeek-i D- H - tlargulies-i E - PI- 

10 Shevachi U- Strober-. Pub- Greene Publishing Associates and Uiley- 
Interscience (Chapter 7-2fi-i Measurement of Cellular Adhesion 
under static conditions 7.26.1-7.25-22) i Takai et al--, Proc* 
Natl- Acad- Sci. USA fiM : bflbM-bflbfl-i 1=167; Bierer et al--i J. Exp. 
Med. lba:HM5-115b-. l=ifla; Rosenstein et al-i J- Exp. Med- 

15 lblrmi-lbD 1=56=1=; Stoltenborg et alo J • Immunol. Methods 175 = 5=1- 
bfl-. Mm; Stitt et al.-. Cell flO:bbl-b?0-» 1=1=15. 

Ant i -Inflammatory Activity 

Proteins of the present invention may also exhibit anti- 
inflammatory activity. The anti-inflammatory activity may be 

20 achieved by providing a stimulus to cells involved in the 

inflammatory response-* by inhibiting or promoting cellr-cell 
interactions (such as-i for example-, cell adhesion)-, by inhibiting 

or promoting chemotaxis o.f cells, involved in the inflammatory 

processi inhibiting or promoting cell extravasationi or by 

25 stimulating or suppressing production of other factors which more 
directly inhibit or promote an inflammatory response. Proteins 
exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions) t including 
without limitation intimation associated with infection (such as 

30 septic shocki sepsis or systemic inflammatory response syndrome 
(SIRS))n ischemia-reperf usion injury-* endotoxin lethality-i 
arthritis-! complement-mediated hyperacute rejection-, nephritis-i 
cytokine or chemokine-induced lung injuryn inflammatory bowel 
disease-. Crohn's disease or resulting from over production of 

35 cytokines such as TNF or IL-1- Proteins of the invention may also 
be useful to treat anaphylaxis and hypersensitivity to an 
antigenic substance or material- 
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Tumor Inhibition Activity 

In addition to the activities described above for 
immunological treatment or prevention of tumorsi a protein of the 
invention may exhibit other anti-tumor activities- A protein may 
5 inhibit tumor growth directly or indirectly (such asi for 

example! via ADCC) - A protein may exhibit its tumor inhibitory 
activity by acting on tumor tissue or tumor precursor tissue-i by 
inhibiting formation of tissues necessary to support tumor growth 
(such as-i for example! by inhibiting angiogenesis) ! by causing 
10 production of other factorsi agents or cell types which inhibit 
tumor growth! or by suppressing! eliminating or inhibiting 
factors-! agents or cell types which promote tumor growth- 

Other Activities 

A protein of the invention may also exhibit one or more of 

15 the following additional activities or effects: inhibiting the 
growthi infection or function of i or killing! infectious agents! 
including! without limitation! bacteria! viruses! fungi and other 
parasites! effecting (suppressing or enhancing) bodily 
characteristics! including! without limitation! height! weight! 

20 hair color! eye color! skin! fat to lean ratio or other tissue 
pigmentation! or organ or body part size or shape (such as-i for 
example! breast augmentation or diminution! change in bone form 
or shape); effecting biorhythms or caricadic cycles or rhythms; 
effecting the fertility of male or female subjects! effecting the 

25 metabolism! catabolismn anabolism! processing! utilization! 
storage or elimination of dietary fat! lipid! protein! 
carbohydrate! vitamins! minerals! cofactors or other nutritional 
factors or component (s) ; effecting behavioral characteristics! 
including! without limitation! appetite! libido! stress! 

30 cognition (including cognitive disorders)! depression (including 
depressive disorders) and violent behaviors! providing analgesic 
effects or other pain reducing effects; promoting differentiation 
and growth of embryonic stem cells in lineages other than 
hematopoietic lineages; hormonal or endocrine activity; in the 

35 case of enzymes! correcting deficiencies of the enzyme and 
treating deficiency-related diseases^ treatment of 
hyperprolif erati ve disorders (such asi for example! psoriasis); 
immunoglobulin-1 ike activity (such asi for example! the ability 
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to bind antigens or complements and the ability to act as an 
antigen in a vaccine composition to raise an immune response 
against such protein or another material or entity which is 
cross-reactive with such protein- 

5 Particular Applications for Certain Clones 

The following sets out a non-exclusive list of applications 
for certain embodiments of the invention. In the interest of 
economy-i applications relevant to multiple embodiments are not 
duplicated in this list- Other embodiments described herein have 
10 similar characteristics-i as described there. The artisan is 
directed-i therefore! to the Description of the Sequences for 
similar descriptions of the functions of other embodiment. 



Testes 



15 



htes3_lDilb: The new protein can find application in 
diagnosis/therapy in leukemia predisposition/disease in the 
modulation of DNA repair. 



20 



htes3_lQnlD: The new protein can find application in 
studying the expression profile of testis-specif ic genes. 



htes3_llal?: The new protein can find application in 
. studying_the_expression profile-of test is-specif ic genes and 
as a new marker for testicular cells- 



25 



htes3_ llcEE: The new protein can find application in 
modulating/blocking of regulatory pathways- 



30 



htes3_lld21 : The new protein can find application in 
diagnosis of diseases due to unnormal protein degradation 
like muscular dystrophy or multiple sclerosis as well as in 
modulating the half life of specific proteins and in 
expression profiling • 



35 Kidney 



hfkdE_3kl The new protein can find application in modulation 
of endocytosis . strong similarity to testicular dynamin 
(Rattus norvegicus). 
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Amygdala : 

hamy2_lDhl7: The new protein can find application in 
modulating protein-protein-interaction and in studying the 
expression profile of amygdala-specific genes- 

hamyS_lDp?: The new protein can find application in 
modulation of NA+/CaB+-exchange and voltage-dependend 
processes. 

hamyE_lld5: The new protein can find application in studying 
the expression profile of amygdala-specific genes and as a 
new marker for amygdala cells- 

hamyE_lln l 4: The new protein can find application in 
modulation of DNA-repair and a as a new tool for 
manipulation of nucleic acids- 

hamy2_3i21f 11 : The new protein can find application 
modulation of cyto skeleton-membrane interact ions - 

Fetal Brain; 

hfbrE_7acl2: The new protein can find application in the 
modulation of translational pathways- 

hf br2_ ?adlfl : The new protein can find application in 
studying the expression profile of brain-specific genes- 

hfbr2_7fidH: The new protein can find application in studying 
the expression profile of brain-specific genes and as a new 
marker for amygdala cells- 

hfbr2_?fleia: The new protein can find application in 
studying the expression profile of brain-specific genes- 

hf br2_?ai51 : The new protein can find application in 
diagnosis/modulation of protein damage and age-related 
degenerative processes. 

Melanoma: 
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hmelE_lE jl : The new protein can find application in studying 
the expression profile of melanoma-specific genes- 



hmel2_ 7gl4: The new protein can find application in 
modulation of the sorting of proteins into different 
compartments. 

hmel2_7kn: The new protein can find application in studying 
the expression profile of melanoma-specific genes. 

VARIANTS OF THE INVENTIVE DNA MOLECULES 

Variants in General 

"Variants^" according to the invention-i include DNA and/or 
protein molecules that resemble-i structurally and/or functionally t 
those set forth herein. Variants may be isolated from natural 
sources ("homologs") -i may be entirely synthetic or may be based in 
part or) both natural and synthetic approaches- 

The section set forth below presents various structural and 
functional characteristics of molecules within the invention- 
Preferred molecules are characterized by a combination of one or 
more of these characteristics. For instance! some preferred 
molecules -are-described— with reference to— at least two structural 
characteristics*! while others may be described with reference to 
at least one structural and at least one functional 
characteristic. 

It will be recognized by the skilled artisan that structure 
ultimately defines function-, i.e. the functions of the molecules 
described herein derives from the structures of those molecules. 
Accordingly-! the structural variants described below that bear the 
closest structural relationship (as variously defined below) to 
the incentive molecules are the variants that most likely will 
preserve biological function- This relationship between structure 
and function will guide the skilled artisan in identifying the 
preferred embodiments of the invention. 
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Splicing Variants 

It is well-known that eukaryotic structural genes are 
comprised of both protein coding and non-coding portions- When 
the messenger RNA is transcribed from the DNA template! it 
contains intronsi which are non-coding! and exonsi which are 
coding. In order to form a translation competent mRNA! the 
introns must be "spliced" out of this initial pre mRNA- 

Specific sequences within the pre mRNA represent "splice 
junctions" that direct the cellular splicing machinery to the 
appropriate position- The splice junctions are loosely conserved 
sequence regions of the pre mRNAi which almost invariably begin 
with GT and end with AG (DNA perspective). The 5 r end of the 
splice junction typically contains about nine somewhat conserved 
residues! for example! C/AAGTA/GAGT - The 3 r end usually contains 
a pyrimidine rich stretch of at least about 11 nucleotides! 
followed by NC/TAGG- Splicing occurs before the GT and after the 
AG- Mount -i Nucleic Acids Res. 10^5^-72 (1^52). 

Interestingly! exons often correspond to discrete functional 
domains of the protein product- The intron/exon arrangement thus 
creates a linear array of nucleotides which can be correlated to 
discrete! and often interchangeable! functional protein fragments. 
GO! Nature 2 C J1: C 1D- C J2 (nfll) ! Branden et al . !. ,EMBO J". 3:1307-10 
(ITSM). This linear arrangement creates the possibility of 
generating multiple different full length proteins by rearranging 
the order of the different functional portions in the array. For 
example! if a set of exons are arranged 1-2-3-M! where (■-> 
represents the introns separating the exonsi a splicing event need 
not simply produce 123^ but may produce 123! 13*4! IBM and so on- 
Production of different mRNA products in this way is commonly 
called "alternative splicing." Andreadis et al.-i Ann. Rev. Cell 
Biol. 3:207-M2 (nfl?). 

Some of the present DNA molecules can be represented in 
modular fashion in terms of their coding regions- Essentiallyn 
these modules are exons (though each "exon" may in fact be made up 
of several exons)-. which may be combined in different ways to form 
a variety of different DNA molecules! each encoding a different 
functional protein- Splicing variants are indicated in the 
Description of the Sequences- 
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Degenerate Variants 

One aspect of the present invention provides "degenerate 
variants" of the nucleic acid fragments of the present invention- 
A "degenerate variant" is a nucleotide fragment which differs from 
those of inventive molecules by nucleotide sequence-i but due to 
the degeneracy of the genetic code-. encodes an identical 
polypeptide sequence - 

Given the known relationship between DNA sequences and the 
proteins they encode-i degenerate variants typically are described 
by reference to this relationship- It is well known that the 
degeneracy of the genetic code results in many possible DNA 
sequences which encode a particular protein- Indeed! of the three 
bases which comprise an amino acid-encoding triplet! the third 
position! and often the second-* almost always may vary- This fact 
alone allows for a class of variant DNA molecules which encode 
protein sequences identical to those disclosed herein-i yet have 
about 30*4 sequence variation. In other words-i the variant DNA 
molecules are about 70% identical to the inventive DNAs-i having no 
additional or deleted sequences- Thus-i one aspect of the 
invention provides degenerate variant DNA molecules encoding the 
inventive protein sequences- 

In one embodiment-i these variants have at least about 70% 
sequence identity with the DNA molecules described herein- In a 
preferred embodiment t these "variants have at least about BOX 
sequence identity to the inventive molecules- In a more preferred 
embodiment these variants have at least about *\UZ sequence 
identity with the inventive molecules. 

Conservative Amino Acid Variants 

Variants according to the invention also may be made that 
conserve the overall molecular structure of the encoded proteins- 
Given the properties of the individual amino acids comprising the 
disclosed protein products! some rational substitutions will be 
recognized by the skilled worker. Amino acid substi tut ions-i i.e. 
"conservative substitutions!" may be madei for instance! on the 
basis of similarity in polarity-i charge! solubility! 
hydrophobicity ! hydrophilicity ! and/or the amphipathic nature of 
the residues involved- 
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For example: (a) nonpolar (hydrophobic) amino acids include 
alanine! leucine-. isoleucinen valinei proline! phenylalanine! 
tryptophan! and methionines (b) polar neutral amino acids include 
glycine! serine! threonine! cysteine! tyrosine! asparagine! and 
5 glutamine! (c) positively charged (basic) amino acids include 
arginine! lysine! and histidineS and (d) negatively charged 
(acidic) amino acids include aspartic acid and glutamic acid- 
Substitutions typically may be made within groups (a)-(d). In 
addition! glycine and proline may be substituted for one another 

10 based on their ability to disrupt cc-helices- Similarly! certain 
amino acids! such as alanine! cysteine! leucine! methionine! 
glutamic acid! glutamine! histidine and lysine are more commonly 
found in cc-helices! while valine! isoleucine! phenylalanine! 
tyrosine! tryptophan and threonine are more commonly found in p- 

15 pleated sheets- Glycine! serine! aspartic acid! asparagine! and 
proline are commonly found in turns- Some preferred substitutions 
may be made among the following groups: (i) S and T=. (ii) P and Gs 
and (iii) At V! L and I- Given the known genetic code! and 
recombinant and synthetic DNA techniques! the skilled scientist 

20 readily can construct DNAs encoding the conservative amino acid 
variants- 

As used herein! "sequence identity" between two polypeptide 
sequences indicates the percentage of amino acids that are 
identical between the sequences- "Sequence similarity" indicates 
25 the percentage of amino acids that either are identical or that 
represent conservative amino acid substitutions. 

Functionally Equivalent Variants 

Yet another class of DNA variants within the scope of the 

invention may be described with reference to the product they 
30 encode- As shown in the Description of the Sequences! some of the 

inventive DNA molecules encode a protein having a degree of 

homology with known proteins! or protein domains- It is expected! 

therefore! that they will have some or all of the requisite 

functional features of such molecules- These "functionally 
35 equivalent variants" products are characterized by the fact that 

they are functionally equivalent! with respect to biological 

activity! to certain known molecules- 
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Also provided herein is information on common structural 
motifs-, including consensus sequences that will guide the artisan 
in constructing functionally equivalent variants- It will be 
understood that the motifs-i identified in the Description of the 
Sequences for each inventive protein 1 may be modified within the 
identified consensus sequences. Thus-, the invention contemplates 
the proteins in the Description of the Sequences that contain 
variability in the consensus sequences identified-! and the 
invention further contemplates the full range of nucleic acids 
encoding them-, and the complements of those nucleic acids- 

Hybridizing Variants 

DNA variants within the invention also may be described by 
reference to their physical properties in hybridization- One 
skilled in the field will recognize that. DNA can be used to 
identify its complement and-, since DNA is double stranded-, its 
equivalent or homolog-, using nucleic acid hybridization 
techniques- It will also be recognized that hybridization can 
occur with less than lOO* complementarity. However-i given 
appropriate choice of conditions! hybridization techniques can be 
used to differentiate among DNA sequences based on their 
structural relatedness to a particular probe- For guidance 
regarding such conditions seei for example-, Sambrook et al.-i ITfiT-i 
MOLECULAR CLONING-, A LABORATORY MANUAL-. Cold Spring Harbor Press-. 
N.Y-Y and Ausubel et al.-, nfl^-, CURRENT PROTOCOLS IN MOLECULAR 
BIOLOGY-, Green Publishing Associates and Wiley Interscience-. N-Y- 

Structural relatedness between two polynucleotide sequences 
can be expressed as a function of "stringency" of the conditions 
under which the two sequences will hybridize with one another- As 
used herein-i the term "stringency" refers to the extent that the 
conditions disfavor hybridization. Stringent conditions strongly 
disfavor hybridization-i and only the most structurally related 
molecules will hybridize to one another under such conditions. 
Conversely-i non-stringent conditions favor hybridization of 
molecules displaying a lesser degree of structural relatedness. 
Hybridization stringency-, therefore-! directly correlates with the 
structural relationships of two nucleic acid sequences- The 
following relationships are useful in correlating hybridization 
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and relatedness (where T m is the melting temperature of a nucleic 
acid duplex) : 



a- T m = + Q-MICG+Ofc 

5 

b- The To of a duplex DNA decreases by 1°C with every 
increase of 1* in the number of mismatched base pairs- 

c (T ro ) M 5 - CTm)^! = lfl-5 logaoji5//il 
10 where \xl and p,2 are the ionic strengths of two 

solutions* 

Hybridization stringency is a function of many factors-, 
including overall DNA concentration! ionic strengths temperature-i 
15 probe size and the presence of agents which disrupt hydrogen 
bonding. Factors promoting hybridization include high DNA 
concentrations! high ionic ..strengths-, low temperatures-, longer 
probe size and the absence of agents that disrupt hydrogen 
bonding- 

20 Hybridization usually is done in two stages- First-, in the 

"binding" stage-, the probe is bound to the target under conditions 
favoring hybridization- Stringency is usually controlled at this 
stage by altering the temperature- For high stringency-, the 
temperature is usually between L>5°C and 70°C-. unless short (<ED 

25 nt) oligonucleotide probes are used- A representative 

hybridization solution comprises t.X SSC-. 0-5* SDSn 5X Denhardt's 
solution and lQ0|j.g of non-specific carrier DNA - See Ausubel et 
al.-. supra.-\ section supplement 57 (l^H). Of course many 

different-i yet functionally equivalent! buffer conditions are 

30 known. Where the degree of relatedness is lower-, a lower 
temperature may be chosen- Low stringency binding temperatures 
are between about SS°C and MD°C Medium stringency is between at 
least about MD°C to less than about tS°C- High stringency is at 
least about b5°C- 

35 Second-, the excess probe is removed by washing. It is at 

this stage that more stringent conditions usually are applied- 
Hencen it is this "washing" stage that is most important in 
determining relatedness via hybridization. Dashing solutions 
typically contain lower salt concentrations- One exemplary medium 

40 stringency solution contains 5X SSC and 0-1/: SDS. A high 
stringency wash solution contains the equivalent (in ionic 
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strength) of less than about 0-EX SSC-> with a preferred stringent 
solution containing about D-1X SSO The temperatures associated 
with various stringencies are the same as discussed above for 
"binding-" The washing solution also typically is replaced a 
number of times during washing- For example-i typical high 
stringency washing conditions comprise washing twice for 3D 
minutes at 55° C- and three times for 15 minutes at fc»Q° C- 

The present invention includes nucleic acid molecules that 
hybridize to the inventive molecules under high stringency binding 
and washing conditions- More preferred molecules (from an mRNA 
perspective) are those that are at least SO V. of the length of any 
one of those depicted in the Description of the Sequences- 
Particularly preferred molecules are at least 75 X of the length 
of those molecules- 

Substitutions, Insertions, Additions and Deletions 

In a general sensei the preferred DNA variants of the 
invention are those that retain the closest relationship-! as 
described by "sequence identity" to the inventive DNA molecules- 
According to another aspect of the invention-! therefore-i 
substitutions-! insertions! additions and deletions of defined 
properties are contemplated- It will be recognized that sequence 
identity between two polynucleotide sequences-! as defined herein-i 
generally is determined with reference to the protein coding 
region of the sequences. Thus-, this definition does not at all 
limit the amount of DNA-i such as vector DNA-i that may be attached 
to the molecules described herein- Preferred DNA sequence 
variants include molecules encoding proteins sharing some or all 
of any relevant biological activity of the native molecule. 

In creating these variants-i the skilled worker will be guided 
by reference to the protein structure- First-* insertions and 
deletions in any recognized functional domain above generally 
should be avoided-! except as noted below in the section entitled 
"Proteins-i" where this domain is discussed in detail. Alterations 
in such domains usually will be limited to conservative amino acid 
substitutions- In addition-* where insertions and deletions are 
desired-i this may be accomplished at the N- and/or Oterminus of 
the protein molecule (or the corresponding coding regions of the 
DNA)- If insertions or deletions are made within the protein-! 
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deletions of major structural features usually should be avoided- 
Thus-i a preferred place to make insertion or deletion variants is 
in non-structural regions-* such as linker regions between two 
alpha helices. 

5 "Substitutions" generally refer to alterations in the DNA 

sequence which do not change its overall length! but only alter 
one or more nucleotide positions! substituting one for another in 
the common sense of the word- One class of preferred 

substitutions-i "degenerate substitutions •»" are those that do not 

10 alter the encoded amino acid sequence. Some subsitutions retains 
50*! SS m 4n L0>C or b5* identity. Preferred substitutions retain at 
least about 70* identity! more preferably at least 70* or 75* 
identity! with the inventive DNAs- Some more preferred molecules 
have at least about 60* identity! more preferably at least SO* or 

15— fl5* identity. Particularly preferred DNAs share at least about 

=50* identityn more preferably at least TO* or =55* identity. 

"Insertions-i" unlike substitutions ! alter the overall length 
of the DNA molecule? and thus sometimes the encoded protein- 
Insertions add extra nucleotides to the interior (not the 5' or 3 r 

20 ends) of the subject DNAs. Preferred insertions are made with 
reference to the protein sequence encoded by the DNA- Thusn it is 
most preferred to provide an insertion in the DNA at a location 
that corresponds to an area of the encoded protein which lacks 
structure- For instance-, it typically would not be beneficial? if 

25 the preservation of biological activity is desired? to provide an 
insertion within an alpha-helical region or a beta-pleated sheet. 
Accordingly! non-structural areasi such as those containing helix- 
breaking glycines and proline residues! are most preferred sites 
of insertion- Other preferred sites of insertion are the splice 

30 sites! which are indicated above in the description of the 
inventive DNA molecules- 
While the optimal size of insertions will vary depending upon 
the site of insertion and its effect on the overall conformation 
of the encoded protein! some general guides are useful- 

35 Generally! the total insertions (irrespective of their number) 
should not add more than about 3D* (or preferably not more than 
30*) to the overall size of the encoded protein- More preferably! 
the insertion adds less than about 10-20* (yet more preferably 10- 
20*) in size! with less than about 10* being most preferred. The 
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number of insertions is limited only by the number of suitable 
insertions sites! and secondarily by the foregoing size 
preferences - 

"Additions!" like insertions! also add to the overall size of 
the DNA moleculei and usually the encoded protein* However! 
instead of being made within the molecule! they are made on the S 9 
or 3 9 endi usually corresponding to the M- or O terminus of the 
encoded protein. Unlike deletions! additions are not very size- 
dependent. Indeedi additions may be of virtually any size- 
Preferred additions^ howevern do not exceed about 100* of the size 
of the native molecule* More preferably! they add less than about 
b0 to 305: to the overall size! with less than about 30* being most 
preferred. 

"Deletions" diminish the overall size of the DNA andn 
therefore-! also reduce the size of the protein encoded by that 
DNA. Deletions may be made from either end of the molecule or 
internal to it- Typical preferred deletions remove discrete 
structural features of the encoded protein- For example-i some 
deletions will comprise the deletion of one or more exons which 
may define a structural feature. Preferred deletions remove less 
than about 305: of the size of the subject molecule. More 
preferred deletions remove less than about EO* and most preferred 
deletions remove less than about 10*. 

Computer -De fined Vairiants and Definition of "Seguenc.e Identity" 

In generals both the DNA and protein molecules of the 
invention can be defined with reference to "sequence identity." 
As used herein-i "sequence identity" refers to a comparison made 
between two molecules using-i for example! the standard Smith- 
Waterman algorithm that is well known in the art- 

Some molecules have at lease about 50*! 55* or bO* identity. 
Preferred molecules are those having at least about b5* sequence 
identity-! more preferably at least b5* or 7D* sequence identity. 
Other preferred molecules have at least about AO*! more preferably 
at least SO* or 65*! sequence identity. Particularly preferred 
molecules have at least about TO* sequence identity! more 
preferably at least TO* sequence identity. Host preferred 
molecules have at least about T5*! more preferably at least T5*! 
sequence identity- As used herein! two nucleic acid molecules or 
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proteins are said to "share significant sequence identity n if the 
two contain regions which possess greater than &SZ sequence (amino 
acid or nucleic acid) identity- 

"Sequence identity" is defined herein with reference the 
5 Blast E algorithm-i which is available at the NCBI 

(http://www.ncbi.nlm.nih.gov/BLAST) using default parameters- 
References pertaining to this algorithm includes those found at 
http : //www. neb i .nlm.nih.gov/BLAST/blast_ref erences . html; 
Altschul-i S-p.n Gishi LJ.-i Miller-i U.-. Myers-. E • U - & Lipman-i D-J- 

10 (l^D) "Basic local alignment search tool." J- Mol - Biol. 

E15:MD3-mDn Gishi W- & States-. D-J. (1^3) "Identification of 
protein coding regions by database similarity search." Nature 
Genet- 3:Ebb-E7Ei Madden-* T-L-n Tatusov-i R-L- & Zhang-. J. (lllb) 
"Applications of network BLAST server" Meth. Enzymol. 2bb:131- 

15 mi; Altschul-. S.F.-, Madden-, T . L • -Sch3f fern A. A.-, Zhang, J.-, 
Zhang-i Z--. Miller-. U- & Lipman-> D-J. (m?) "Gapped BLAST and 
PSI-BLAST: a new generation of protein database search programs-" 
Nucleic Acids Res- E5 : 33flT-3MDE=> and Zhang, J. a Madden-. T-L- 
(1^7) "PowerBLAST : A new network BLAST application for 

20 interactive or automated sequence analysis and annotation." 
Genome Res- Vibm-tSb. 

METHODS OF MAKING VARIANTS 

It will be recognized that variants of the inventive 
molecules can be constructed in several different ways. For 

25 example-, they may be constructed as completely synthetic DNAs - 
Methods of efficiently synthesizing oligonucleotides in the range 
of ED to about ISO nucleotides are widely available- See Ausubel 
et al.-. supra-* section E-lli Supplement El (IT 1 ^). Overlapping 
oligonucleotides may be synthesized and assembled in a fashion 

30 first reported by Khorana et al . -. J. Mol. Biol. 7E : EDT-E17 (1^71)^ 
see also Ausubel et al-. Section fi-E. The synthetic DNAs are 
designed with convenient restriction sites engineered at the S r 
and 3' ends of the gene to facilitate cloning into an appropriate 
vector - 

35 An alternative method of generating variants is to start with 

one of the inventive DNAs and then to conduct site-directed 
mutagenesis. See Ausubel et al.i supra-i chapter fi-i Supplement 37 
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(m?)- In a typical method-, a target DNA is cloned into a 
single-stranded DNA bacteriophage vehicle- Single-stranded DNA is 
isolated and hybridized with a oligonucleotide containing the 
desired nucleotide alteration (s) . The complementary strand is 
5 synthesized and the double stranded phage is introduced into a 
host- Some of the resulting progeny will contain the desired 
mutantn which can be confirmed using DNA sequencing. In addition-, 
various methods are available that increase the probability that 
the progeny phage will be the desired mutant. These methods are 
10 well known to those in the field and kits are commercially 
available for generating such mutants- 

ISOLATING HOMOLOGS 
Methods 

By using the sequences disclosed herein as probes or .as 

15 primers-, and techniques such as PCR cloning and colony/plaque 
hybridization-i one skilled in the art can obtain homologs- 
"Homologs" are essentially naturally-occurring variants and 
include allelic-i species-specific and tissue-specific variants. 

Region-specific primers or probes derived from the nucleotide 

20 sequence(s) provided can be used to prime DNA synthesis and PCR 
amplif ication-i as well as to identify colonies containing cloned 
DNA encoding a homolog using known methods (Innis et al\ , PCR 
Protocols, Academic Pressi San Diego-i CA (mo)). Such an 
application is useful in diagnostic methods-i as described in more 

25 detail below-i as well as in preparing full-length DNAs from 
various sources. The PCR primers are preferably at least 15 
bases-, and more preferably at least lfi bases in length- When 
selecting a primer sequence-i it is preferred that the primer pairs 
have approximately the same G/C ration so that melting 

30 temperatures are approximately the same. As a general guide-, the 
formula 3(G+C) + 2CA+T) = °Ci is useful. 

When using primers derived from the inventive sequences-! one 
skilled in the art will recognize that by employing high 
stringency conditions (e.gr.-i annealing at 50-bO°C)i only sequences 

35 with greater than 752 sequence identity to the primer will be 
amplified- By employing lower stringency conditions (e.g. -i 
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annealing at 35-37°C)-i sequences which have greater than HD-SO* 
sequence identity to the primer %lso will be amplified. 

The PCR product may be subcloned and sequenced to confirm 
that it indeed displays the expected sequence identity- The PCR 
fragment may then be used to isolate a full length cDNA clone by a 
variety of methods- For examplei the amplified fragment may be 
labeled and used to screen a bacteriophage cDNA library. 
Al ternati vely i the labeled fragment may be used to screen a 
genomic library* 

PCR technology may also be utilized to isolate full length 
cDNA sequences- For example-i RNA may be isolated-, following 
standard proceduresi from an appropriate cellular or tissue 
source- A reverse transcription reaction may be performed on the 
RNA using an oligonucleotide primer specific for the most 5* end 
of the amplified fragment for the priming of first strand 
synthesis- The resulting RNA/DNA hybrid may then be "tailed" with 
guanines using a standard terminal transferase reaction-! the 
hybrid may be digested with RNAase Hi and second strand synthesis 
may then be primed with a poly-C primer- Thus-i cDNA sequences 
upstream of the amplified fragment may easily be isolated- For a 
review of cloning strategies which may be usedi see e-g-n Sambrook 
et al.-i ITfl^T supra. 

When using DNA probes derived from the inventive sequences 
for colony/plaque hybridization one skilled in the art will 
recognize that by employing medium to high stringency conditions 
(e-g-i hybridizing at SD-Li5 0 C in 5X SSPC and SCR formamidei and 
washing at 50-b5°C in D-SX SSPO-i sequences having regions with 
greater than ^Qk sequence identity to the probe can be obtained-! 
and that by employing lower stringency conditions (e-g-i 
hybridizing at 35-3?°C in 5X SSPC and ^U-^SX formamide-, and 
washing at 42°C in SSPO-i sequences having regions with greater 
than sequence identity to the probe will be obtained- 

Suitablyn genomic or cDNA libraries can be constructed and 
screened in accord with the previous paragraph- The libraries 
should be derived from a tissue or organism that is known to 
express the gene of interest-i or that is suspected of expressing 
the gene- The clone containing the homolog may then be purified 
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through methods routinely practiced in the arti and subjected to 
sequence analysis- 

Additionally t an expression library can be constructed 
utilizing DNA isolated from or cDNA synthesized from a tissue or 
organism that is known to express the gene of interest! or that is 
suspected of expressing the gene- In this manner-i clones may be 
induced and screened using standard antibody screening techniques 
in conjunction with antibodies raised against the normal gene 
product-i as described herein- (For screening techniques! seen for 
example-i Harlowi E . and Lane-i eds--i nflfli ANTIBODIES: A 
LABORATORY MANUAL t Cold Spring Harbor Press-, Cold Spring Harbor 
Press-) 

Human Homologs 

Any organism or tissue can be used as the source for homologs 
of the present invention so long as the organism or tissue 
naturally expresses such a protein or contains genes encoding the 
same- The most preferred organism for isolating homologs is 
human - 

PROTEINS OF THE INVENTION 

One class of proteins included within the invention is 
encoded by the inventive DNA molecules presented- Other proteins 
according to the invention are those encoded by the DNA variants 
described above- As noted-i these variants are designed with the 
encoded proteins in mind- 

A preferred class of protein fragments includes those 
fragments which retain any biological activity. These molecules 
share functional features common the family of proteins! although 
these characteristics may vary in degree- 
According to one aspect of the invention fragments of the 
inventive proteins are contemplated- Some preferred fragments are 
those which are capable of eliciting an immune response- 
Generally these "antigenic" fragments will be from about five 
amino acids in length to about fifty amino acids in length- Some 
preferred antigenic fragments are from five to about twenty amino 
acids long- "Antigenic" response may refer to a T cell response! 
a B cell response or a response by cells of the 
macrophage/monocyte lineages- In most casesi however! it will 
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refer to the immune response involved in the generation of 
antibodies* In other words-i the relevant immune response is that 
of helper T cells and/or B cells. These preferred molecules 
comprise one or more T cell and /or B cell epitopes. 

5 ANTIBODIES OF THE INVENTION 

Antibodies raised against the proteins and protein fragments 
of the invention also are contemplated by the invention- 
Described below are antibody products and methods for producing 
antibodies capable of specifically recognizing one or more 
10 epitopes of the presently described proteins and their 
derivatives. 

Antibodies include! but are not limited to polyclonal 
antibodies-i monoclonal antibodies (mAbs)-i humanized or chimeric 
antibodies-i single chain antibodies including single chain Fv 

15 (scFv) fragments-! Fab fragments-! F(ab r ) e fragments-i fragments 
produced by a Fab expression library-i anti-idiotypic (anti-Id) 
antibodies-! epitope-binding fragments-i and humanized forms of any 
of the above- 
As known to one in the art-, these antibodies may be used-! for 

20 example-, in the detection of a target protein in a biological 
sample. They also may be utilized as part of treatment methods-, 
and/or may be used as part of diagnostic techniques whereby 
patients may be tested for abnormal levels or for the presence of 
abnormal forms of the such proteins. 

25 In general-. techniques for preparing polyclonal and 

monoclonal antibodies as well as hybridomas capable of producing 
the desired antibody are well known in the art (Campbell-, A-M-n 
Monoclonal Antibody Technology: Laboratory Techniques in 
Biochemistry and Molecular Biology^ Elsevier Science Publishers-! 

30 Amsterdam-i The Netherlands (lTflH ) t St. Groth et alo J". Immunol . 
Methods 35:1-B1 (nfiD)n Kohler and flilstein-. Nature 25L. : HTS-in? 
(1^75))-! the trioma technique-i the human B-cell hybridoma 
technique (Kozbor et a2.\ Immunology Today H'.?Z (nS3)} Cole et 
al . -i in Monoclonal An tibodi es and Cancer Therapy t Alan R • Liss-i 

35 Inc. (naS)n pp. 77-^b). Antibodies may also be generated by the 
known techniques of phage display and in vitro immunization. 
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Polyclonal Antibodies 

Polyclonal antibodies are heterogeneous populations of 
antibody molecules derived from the sera of animals • immunized with 
an antigen-, such as an inventive protein or an antigenic 
5 derivative thereof. 

Polyclonal antiserums containing antibodies to heterogeneous 
epitopes of a single proteins can be prepared by immunizing 
suitable, animals with the expressed protein described above-, which 
can be unmodified or modified-! as known in the arts to enhance 
10 immunogenicity - Immunization methods include subcutaneous or 
intraperitoneal injection of the polypeptide. 

Effective polyclonal antibody production is affected by many 
factors related both to the antigen and to the host species* For 
example-, small molecules tend to be less immunogenic than other 
15 and may require the use of carriers and/or adjuvant. In addition-, 
host animal response may vary with site of inoculation- Both 
inadequate or excessive doses of antigen may result in low titer 
antisera. In general-, however-, small doses (high ng to low jig 
levels) of antigen administered at multiple intradermal sites 
20 appears to be most reliable. Host animals may include but are not 
limited to rabbits-, mice-, chickens and rats-, to name but a few. 
An effective immunization protocol for rabbits can be found in 
Vaitukaitiss J . et aL, J. Clin. Endocrinol. Metah. 33'-°lB>&-^l, 
(1^71) - 

25 The protein immunogen may be modified or administered in an 

adjuvant in order to increase the protein's antigenicity. Methods 
of increasing the antigenicity of a protein are well known in the 
art and include-, but are not limited to coupling the antigen with 
a heterologous protein (such as globulin p-galactosidase ) or 

30 through the inclusion of an adjuvant during immunization- 
Adjuvants include Freund's (complete and incomplete)-, mineral gels 
such as aluminum hydroxide-, surface active substances such as 
lysolecithins pluronic polyols-i polyanions-. peptides-! oil 
emulsions-, keyhole limpet hemocyanin-. dinitrophenol and 

35 potentially useful human adjuvants such as BCG (bacille Calmette- 
Guerin) and Corynebacteriwn parvum. 

Booster injections can be given at regular intervals-i with at 
least one usually being required for optimal antibody production- 
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The antiserum may be harvested when the antibody titer begins to 
fall. Titer may be determined semi-quantitati vely -i for example-i 
by double immunodiffusion in agar against known concentrations of 
the antigen. See-i for examples Ouchterlony et al . , Chap. M in* 
5 Handbook of Experimental Immunology, hJier-. ed-» Blackwell (1^73). 
Plateau concentration of antibody is usually in the range of 
to 0-2 mg/ml of serum (about IB jjJI)- The antiserum may be 
purified by affinity chromatography using the immobilized 
immunogen carried on a solid support. Such methods of affinity 

10 chromatography are well known in the art. 

Affinity of the antisera for the antigen may be determined by 
preparing competitive binding curves-i as described-* for examplei 
by Fisher -i Chap. H2 in: Manual of Clinical Immunology, second 
edition-i Rose and Friedman^ eds*i Amer. Soc For Microbiology n 

15 ~ Washington-. D-C. cnflD). 

In addition to using protein an the immunogen-i DNA molecules 
may be used directly. In this manner-! a DNA encoding the protein 
immunogen is administered- Boosting and harvesting is done in a 
manner analogous to that detailed above. Yet another method of 

20 producing antibodies entails immunizing chickens and harvesting 
the antibodies from their eggs- 

Monoclonal Antibodies 

Monoclonal antibodies (MAbs)-i are homogeneous populations of 
antibodies to a particular antigen. They may be obtained by any 
25 technique which provides for the production of antibody molecules 
by continuous cell lines in culture or in vivo. MAbs may be 
produced by making hybridomas which are immortalized cells capable 
of secreting a specific monoclonal antibody. 
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Monoclonal antibodies to any of the proteins-, peptides and 
epitopes thereof described herein can be prepared from murine 
hybridomas according to the classical method of Kohler-. 6- and 
Milstein-. d Nature 25b : MT5-MT7 (1^75) (and U-S- Patent No. 
5 M-.37b-.110) or modifications of the methods thereof-, such as the 
human B-cell hybridoma technique (Kosbor et aI.-> nfi3-> Immunology 
Today M:?2t Cole et al.i Proc. Natl. Acad. Sci. USA flQ: 

202b-2030)-. and the EBV-hybr idoma technique (Cole et al.-» 1=155-1 
MONOCLONAL ANTIBODIES AND CANCER THERAPY-. Alan R. Liss, Inc.-, pp. 
10 77-^b). 

In one method a mouse is repetitively inoculated with a few 
micrograms of the selected protein over a period of a few weeks- 
The mouse is then sacrificed-, and the antibody producing cells of 
the spleen are isolated. 

15 The spleen cells are fused-, typically using polyethylene 

glycols with mouse myeloma cells-, such as SP2/0-AglM myeloma 
cells- The excess-, unfused cells are destroyed by growth of the 
system on selective media comprising aminopterin (HAT media). The 
successfully fused cells are diluted-, and aliquots are plated to 

20 microliter plates where growth is continued- Antibody — 

producing clones (hybridomas) are identified by detection of 
antibody in the supernatant fluid of the wells by immunoassay 
procedures- These include ELISA-. as originally described by 
Engvall-i Meth. Enzymol . 7D:M1 C 1 (1=180)-. western blot analysis-i 

25 radioimmunoassay (Lutz et al.i Exp. Cell Res. 175:10=1-1214 (ITfifi)) 
and modified methods thereof- 
Selected positive clones can be expanded and their monoclonal 
antibody product harvested for use. Detailed procedures for 
monoclonal antibody production are described in Davis-. L- et al. 

30 BASIC METHODS IN MOLECULAR BIOLOGY-. Elsevier-. New York- Section 
21-2 cnAT). The hybridoma clones may be cultivated in vitro or 
in vivo-, for instance as ascites- Production of high titers of 
mAbs in vivo makes this the presently preferred method of 
production. Alternatively hybridoma culture in hollow fiber 

35 bioreactors provides a continuous high yield source of monoclonal 
antibodies. 

The antibody class and subclass may be determined using 
procedures known in the art (Campbell-! A - M . -i Monoclonal Antibody 
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Technology: Laboratory Techniques in Biochemistry and Molecular 
Biology** Elsevier Science Publishersn Amsterdarnn The Netherlands 
<nfi4>)- MAbs may be of any immunoglobulin class including IgGi 
Iglln IgEn IgAi IgD and any subclass thereof- Methods of purifying 
5 monoclonal antibodies are well known in the art. 

Antibody Derivatives and Fragments 

Fragments or derivatives of antibodies include any portion of 
the antibody which is capable of binding the target antigen-i or a 
specific portion thereof. Antibody derivatives include poly- 
10 specific (e.gr., bi-specific) antibodies-i which contain binding 
sites specific for two or more different epitopes- These epitopes 
may be from the same or different inventive molecules or one or 
more epitope may be from a molecule not specifically disclosed 
here- 

15 Antibody fragments specifically include F(ab r ) 2 i Fab-i Fab r 

and Fv fragments. These can be generated from any class of 
antibody i but typically are made from IgG or Igil. They may be 
made by conventional recombinant DNA techniques ori using the 
classical methodi by proteolytic digestion with papain or pepsin- 

20 See CURRENT PROTOCOLS IN IMMUNOLOGY-, chapter Bi Coligan et al . 
eds--. (John Wiley & Sons lW-^). 

F(ab f )s fragments are typically about 11D kDa (IgG) or about 
15D kDa (IgM) and contain two antigen-binding regions-i joined at 
the hinge by disulfide bond(s)- Virtually all-i if not allv of the 

25 Fc is absent in these fragments. Fab' fragments are typically 
about 55 kDa (IgG) or about 75 kDa (IgM) and can be formedi for 
example-i by reducing the disulfide bond(s) of an F(ab r )2 fragment. 
The resulting free sulfhydryl group(s) may be used to conveniently 
conjugate Fab r fragments to other molecules-i such as detection 

30 reagents (e.gr. i enzymes). 

Fab fragments are monovalent and usually are about 5D kDa 
(from any source). Fab fragments include the light (L) and heavy 
(H) chain-* variable (Vl and Vht respectively) and constant (Q Chi 
respectively) regions of the antigen-binding portion of the 

35 antibody. The H and L portions are linked by an intramolecular 
disulfide bridge- 

Fv fragments are typically about 25 kDa (regardless of 
source) and contain the variable regions of both the light and 
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heavy chains (Vl and Vh-i respectively)- Usually-i the Vl and V H 
chains are held together only by non-covalent interacts andi thus-, 
they readily dissociate. They do-i however-i have the advantage of 
small size and they retain the same binding properties of the 
5 larger Fab fragments- Accordingly -i methods have been developed to 
crosslink the Vl and Vh chains-i using-i for example-i glutaraldehyde 
(or other chemical crosslinkers) intermolecular disulfide bonds 
(by incorporation of cysteines) and peptide linkers* The 
resulting Fv is now a single chain (i.e.-i SCFv). 

10 Other antibody derivatives include single chain antibodies 

(U.S. Patent M-, c lMb-.77fin Bird-, Science 5HB:ME3-MEb (nflfl)i Huston 
et al.-i Proc- Natl- Acad- Sci- USA a5:Sa7^-Sfifl3 (nflfl)^ and Ulard 
et al.-i Nature 33M : S^-S^d (na^)>. Single chain antibodies are 
formed by linking the heavy and light chain fragments of the Fv 

15 region via an amino —acid bridge-i resulting in a single chain FV 
(SCFv)o 

One preferred method involves the generation of scFvs by 
recombinant methods-* which allows the generation of Fvs with new 
specificities by mixing and matching variable chains from 

20 different antibody sources. In a typical methodi a recombinant 
vector would be provided which comprises the appropriate 
regulatory elements driving expression of a cassette region. The 
cassette region would contain a DMA encoding a peptide linker-* 
with convenient sites at both the S 9 and 3 9 ends of the linker for 

25 generating fusion proteins. The DMA encoding a variable region(s) 
of interest may be cloned in the vector to form fusion proteins 
with the linker-i thus generating an scFv. 

In an exemplary alternative approach-i DMAs encoding two Fvs 
may be ligated to the DNA encoding the linker-i and the resulting 

30 tripartite fusion may be ligated directly into a conventional 
expression vector. The scFv DNAs generated any of these methods 
may be expressed in prokaryotic or eukaryotic cellsi depending on 
the vector chosen. 

Antibody fragments which recognize specific epitopes may be 

35 generated by known techniques- For example-i such fragments 
include but are not limited to: the F(ab p )2 fragments which can be 
produced by pepsin digestion of the antibody molecule and the Fab 
fragments which can be generated by reducing the disulfide bridges 
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of the F(ab)2 fragments- Alternat i vely -i Fab expression libraries 
may be constructed (Huse et al-i nfi^-i Science^ 24fci : 1275-lBfll ) to 
allow rapid and easy identification of monoclonal Fab fragments 
with the desired specificity. 
5 Derivatives also include "chimeric antibodies'* (Morrison et 

al.i Proc. Natl. Acad. Sci . i fll : LfiSl-bfiSS (ITS 1 *)! Neuberger et 
al.i Naturei 312:fc»Di4-b06 (naM); Takeda et al . Nature^ 314:452- 
454 (nflS)). These chimeras are made by splicing the DNA encoding 
a mouse antibody molecule of appropriate specif icity with-i for 

10 instance! DNA encoding a human antibody molecule of appropriate 
specificity. Thusi a chimeric antibody is a molecule in which 
different portions are derived from different animal speciesi such 
as those having a variable region derived from a murine mAb and a 
human immunoglobulin constant region. These are also known 

15 sometimes as "humanized" antibodies and they offer the added 
advantage of at least partial shielding from the human immune 
system. They are-i therefore! particularly useful in therapeutic 
in vivo applications - 

Labeled Antibodies 

20 The present invention further provides the above-described 

antibodies in detectably labeled form- Antibodies can be 

detectably labelled through the use of radioisotopes-! affinity 
labels (such as biotin-. avidin^ etc-)-, enzymatic labels (such as 
horseradish peroxidase! alkaline phosphatase! etc) fluorescent 

25 labels (such as FITC or rhodamine! etcOi paramagnetic atoms! etc- 
Procedures for accomplishing such labeling are well-known in the 
art! for example see (Sternberger et al-i J. Histochem. Cytochem* 
lfl:315 (1=170); Bayer et al-! Metia. Enzym- b2:30fl (117^) ! Engval et 
al-! Immunol- IDT : 12^ (1^72)! Coding! J. Immunol. Mefch. 13:215 

30 (n7t))- The labeled antibodies of the present invention can be 
used for in vitro, in vivo! and in situ diagnostic assays. 

Immobilized Antibodies 

The foregoing antibodies also may be immobilized on a solid 
support. Examples of such solid supports include plastics such as 
35 polycarbonate! complex carbohydrates such as agarose and 
sepharose!* acrylic resins and such as polyacrylamide and latex 
beads- Techniques for coupling antibodies to such solid supports 

-570- 



WO 01/98454 PCT/IB01/02050 
are well known in the art (Ueir efc al-n 91 Handbook of Experimental 
Immunology" Mth Ed-! Blackwell Scientific Publications! Oxford^ 
England! Chapter ID (l^ab)^ Jacoby et al . , Meth. Enzym. 3M 
Academic Pressi N-Y- (117M)). The immobilized antibodies of the 
5 present invention can be used for In vitro, In vivo! and in situ 
assays as well as for immunoaf f inity purification of the proteins 
of the present invention. 

THERAPEUTIC AND DIAGNOSTIC COMPOSITIONS 

The proteinsi antibodies and polynucleotides of the present 

10 invention can be formulated according to known methods to prepare 
pharmaceutical^ useful compositions! whereby these materials! or 
their functional derivatives! are combined in admixture with a 
pharmaceutical^ acceptable carrier vehicle- Suitable vehicles 
and their formulation! inclusive of other human proteins! e-g-n 

15 human serum albumini are described! for example! in Remington ' s 
Pharmaceutical Sciences (Ibth ed-! Osoli A-! Ed-! Mack! Easton PA 
(nflD))- In order to form a pharmaceutical^ acceptable 

composition suitable for effective administration! such 
compositions will contain an effective amount of one or more of 

20 the agents of the present invention! together with a suitable 
amount of carrier vehicle- 
Pharmaceutical compositions for use in accordance with the 
present invention may be formulated in conventional manner using* 
one or more physiologically acceptable carriers or excipients- 

25 Thus! the compounds and their physiologically acceptable salts and 
solvate may be formulated for administration by inhalation or 
insufflation (either through the mouth or the nose) or orali 
buccal! parenteral or rectal administration- 

For oral administration! the pharmaceutical compositions may 

30 take the form of! for example! tablets or capsules prepared by 
conventional means with pharmaceutical^ acceptable excipients 
such as binding agents (e.g. ! pregelatinised maize starch! 
polyvinylpyrrolidone or hydroxypropyl methylcellulose) ! fillers 
(e.g.! lactose! microcrystalline cellulose or calcium hydrogen 

35 phosphate)! lubricants (e.g.! magnesium stearatei talc or silica)! 
disintegrants (e.gr. !. potato starch or sodium starch glycolate)! or 
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wetting agents (e.g.-i sodium lauryl sulphate). The tablets may be 
coated by methods well known in the art- Liquid preparations for 
oral administration may take the form of i for example-i solutions-! 
syrups or suspensions-i or they maybe presented as a dry product 
5 for constitution with water or other suitable vehicle before use- 
Such liquid preparations may be prepared by conventional means 
with pharmaceutical^ acceptable additives such as suspending 
agents (e.g\-i sorbitol syrup-i cellulose derivatives or 
hydrogenated edible fats); emulsifying agents (e.gr. -i lecithin or 
10 acacia); non-aqueous vehicles (e.g. -i almond oil-i oily esters-, 
ethyl alcohol or fractionated vegetable oils); and preservatives 
(e.gr. t methyl or propyl-p-hydroxybenzoates or sorbic acid)-. The 
preparations may also contain buffer saltsi flavoring! coloring 
and sweetening agents as appropriate. 
15 Preparations for oral administration may be suitably 

formulated to give controlled release of the active compound* For 
buccal administration the composition may take the form of tablets 
or lozenges formulated in conventional manner. 

For administration by inhalation-* the compounds for use 
20 according to the present invention are conveniently delivered in 
the form of an aerosol spray presentation from pressurized packs 
or a nebuliser-i with the use of a suitable propellant-i e.g., 
dichlorodif luoromethane-i trichlorof luoromethane-i 

dichlorotetraf luoroethane i carbon dioxide or other suitable gas- 
25 In the case of a pressurized aerosol the dosage unit may be 
determined by providing a valve to deliver a metered amount- 
Capsules and cartridges of-, e.g. gelatin for use in an inhaler or 
insufflator may be formulated containing a powder mix of the 
compound and a suitable powder base such as lactose or starch. 
30 The compounds may be formulated for parenteral administration 

by injection-i e.g.i by bolus injection or continuous infusion- 
Formulations for injection may be presented in unit dosage form-, 
e.g.i in ampules or in multi-dose containers! with an added 
preservative- The compositions may take such forms as 

35 suspensions! solutions or emulsions in oily or aqueous vehicles-* 
and may contain formulatory agents such as suspending-i stabilizing 
and/or dispersing agents- Alternatively •> the active ingredient 
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may be in powder form for constitution with a suitable vehicle-, 
e.gr.-i sterile pyrogen-free uateri before use- 

The compounds may also be formulated in rectal compositions 
such as suppositories or retention enemasi e.gr. -, containing 
conventional suppository bases such as cocoa butter or other 
glycerides - 

In addition to the formulations described previously-, the 
compounds may also be formulated as a depot preparation- Such 
long acting formulations may be administered by implantation (for 
example subcutaneously or intramuscularly) or by intramuscular 
injection- Thus-i for example-, the compounds may be formulated 
with suitable polymeric or hydrophobic materials (for example as 
an emulsion in an acceptable oil) or ion exchange resins-, or as 
sparingly soluble derivatives! for example-i as a sparingly soluble 
salt- 

The compositions mayn if desired-, be presented in a pack or 
dispenser device which may contain one or more unit dosage forms 
containing the active ingredient- The pack may for example 
comprise metal or plastic -foil-i such as a blister pack- The pack 
or dispenser device may be accompanied by instructions for 
administration- 

RECOMBINANT CONSTRUCTS AND EXPRESSION 

The present invention further provides recombinant DNA 
constructs comprising one or more of the nucleotide sequences of 
the present invention- The recombinant constructs of the present 
invention comprise a vector-i such as a plasmid or viral vector 
into Uhich a DNA or DNA fragment-, typically bearing an open 
reading framei is inserted-! in either orientation- The gene 

products encoded by the subject DNAs may be produced by 
recombinant DNA technology using techniques well known in the art- 
Seei for examplei the techniques described in Sambrook et al--, 
nflT-i supra-i and Ausubel et al--, nfiTi supra. Alternatively n the 
DNA sequences may be chemically synthesized usingi for examplei 
synthesizers. Seen for examplei the techniques described in 
OLIGONUCLEOTIDE SYNTHESIS-, Gait-, ed--, IRL Press-, Oxford-, 

which is incorporated by reference herein in its entirety. They 
may be assembled from fragments and short oligonucleotide linkers-, 
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or from a series of oligonucleotides- The are preferably made by 
RT-PCR methods- The resulting synthetic gene is capable of being 
expressed in a recombinant vector- 
In some cases the recombinant constructs will be expression 
vectors-i which are capable of expressing the RNA and/or protein 
products of the encoded DNA (s) • Thus-, the vector may further 
comprise regulatory sequences-! including for example-, a promoter-, 
operably linked to the open reading frame (ORF)- The vector may 
further comprise a selectable marker sequence- 
Specific initiation signals may also be required for 
efficient translation of inserted target gene coding sequences- 
These signals include the ATG initiation codon and adjacent 
sequences- In cases where a target DNA includes its own 
initiation codon and adjacent sequences is inserted into the 
appropriate expression vector-, no additional translation control 
signals may be needed- Howeveri in cases where only a portion of 
an ORF is usedi exogenous translat ional control . signals-i 
including-i perhaps-, the ATG initiation codon-. must be provided- 
Furthermore-. the initiation codon must be in phase . with the 
reading frame of the desired coding sequence to ensure translation 
of the entire target- These exogenous translat ional control 
signals and initiation codons can be of a variety of origins-, both 
natural and synthetic- The efficiency of expression may be 
enhanced by the inclusion of appropriate transcription enhancer 
elements-i transcription terminators-, etc- (see Bittner et al.i 
Methods in Enzymol. 153 : Slb-SM 1 * (1^37))- Some appropriate cloning 
and expression vectors for use with prokaryotic and eukaryotic 
hosts are described by Sambrook-i et al., in Molecular Cloning: A 
Laboratory Manual-* Second Edition-i Cold Spring Harbor-. New York 
(nfiT)-. the disclosure of which is hereby incorporated by 
reference- 

If desired-i to enhance expression and facilitate proper 
protein folding-i the codon context and codon pairing of the 
sequence may be optimized for the particular expression organism-i 
as explained by Hatfield et al., U-S- Patent No- 5 t OfiE 7t 7 - 

The present invention further provides host cells containing 
at least one of the DNAs of the present invention- The host cell 
can be virtually any cell for which expression vectors are 
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available- It may ben for examplei a higher eukaryotic host celli 
such as a mammalian celli a lower eukaryotic host celli such as a 
yeast celli or the host cell can be a prokaryotic celln such as a 
bacterial cell- Introduction of the recombinant construct into 
5 the host cell can be effected by calcium phosphate transf ection-i 
DEAE-. dextran mediated transf ectiorn or electroporation (Davis et 
al., Basic Methods in Molecular Biology d^ab))- 

A wide variety of expression systems are available-i such as: 
yeast (e.gr. Saccharomyces , Pichia) transformed with recombinant 

10 yeast expression vectors containing the target DNAt insect cell 
systems infected with recombinant virus expression vectors (e.gr.-i 
baculovirus) containing the target DNA sequences^ plant cell 
systems infected with recombinant virus expression vectors (e.gr.n 
cauliflower mosaic virusi CaMVi tobacco mosaic virusi TMV) or 

15 transformed with recombinant plasmid expression vectors (e.gr. Ti 
plasmid) containing target DNA coding sequences 2 ; or mammalian cell 
systems (e.gr. COSi CH0-. BHKt 2^3-1 3T3) harboring recombinant 
expression constructs containing promoters derived from the genome 
of mammalian cells (e.gr. -i metal lothionein promoter) or from 

20 mammalian viruses (e.gr.n the adenovirus late promoters the 
vaccinia virus 7-5K promoter). 

Depending on the system choseni the resulting product may 
differ- For examplei proteins expressed in most bacterial 
cultures-! e.gr. n E. coli, will be free of glycosylation 

25 modifications^ polypeptides or proteins expressed in yeast will 
have a glycosylation pattern different from that expressed in 
mammalian cells- 

Vectors 

Generallyn recombinant expression vectors will include 
30 origins of replication and selectable markers permitting selection 
of the host celli e.gr. i the ampicillin resistance gene of E. coli 
and S. cerevisiae TRP1 gene-i and a promoter derived from a highly- 
expressed gene to direct transcription of a downstream structural 
sequence. Such promoters can be derived from operons encoding 
35 glycolytic enzymes such as 3-phosphoglycerate kinase (PGK)i 
a-factor-i acid phosphatasen or heat shock proteinsi among others- 
The heterologous structural sequence is assembled in appropriate 
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phase with translation initiation and termination sequence-i and in 
one aspect of the invention! a leader sequence capable of 
directing secretion of translated protein into the periplasmic 
space or extracellular medium. Optionally! the heterologous 
5 sequence can encode a fusion protein including an N-terminal or O 
terminal identification peptide imparting desired characteristics! 
e.g.! stabilization or simplified purification of expressed 
recombinant product- 

Bacterial Expression 

10 Useful expression vectors for bacterial use are constructed 

by inserting a structural DNA sequence encoding a desired protein 
together with suitable translation initiation and termination 
signals in operable reading phase with a functional promoter- The 
vector will comprise one or more phenotypic selectable markers and 

15 an origin of replication to ensure maintenance of the vector and! 
if desirable! to provide amplification within the host- Suitable 
prokaryotic hosts for transformation include B. coli, Bacillus 
subtilis, Salmonella typhimurium and various species within the 
genera Pseudomonasi Streptomyces ! and Staphylococcus! although 

20 others may! also be employed as a matter of choice- 
Bacterial vectors may be! for example! bacteriophage-! 
plasmid- or cosmid-based- These vectors can comprise a selectable 
marker and bacterial origin of replication derived from 
commercially available plasmids typically containing elements of 

25 the well known cloning vector pBR322 (ATCC 37017)- Such 
commercial vectors include! for example! GEN 1 (Promega Biotec! 
Nadisoni lill! USA>! pBs! phagescript! PsiX17M! pBluescript SK! pBs 
KS! pNHfia! pNHlta! pNHlfla! pNHMba (Stratagene) *! pTrc'^A! pKK223-3! 
pKK233-3! pKK232-A! pDR5i*0! and pRIT5 (Pharmacia). 

30 These "backbone" sections are combined with an appropriate 

promoter and the structural sequence to be expressed- Bacterial 
promoters include lac! T3! T?! lambda P R or Pi_-» trp! and ara- 

Following transformation of a suitable host strain and growth 
of the host strain to an appropriate cell density! the selected 

35 promoter is derepressed/induced by appropriate means Ce-g-! 
temperature shift or chemical induction) and cells are cultured 
for an additional period- Cells are typically harvested by 
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centrif ugation-i disrupted by physical or chemical meansi and the 
resulting crude extract retained for further purification- 

In bacterial systemsn a number of expression vectors may be 
advantageously selected depending upon the use intended for the 
protein being expressed- For example-, when a large quantity of 
such a protein is to be produced-! for the generation of antibodies 
or to screen peptide libraries-i for example-, vectors which direct 
the expression of high levels of fusion protein products that are 
readily purified may be desirable- Such vectors includei but are 
not limited-* to the E. coli expression vector pUR27fl (Ruther et 
al-i nflBi EMBO J". 2:17Tl)n in which the coding sequence may be 
ligated into the vector in frame with the lac Z coding region so 
that a fusion protein is produced^ pIN vectors (Inouye et al. 
nflSi Nucleic Acids Res. 13 : BIDI-BID^ Van Heeke et aLi l^fi^-. J". 
Biol. Chem. 2bM = 5S03-5SDT) * pET vectors-. Studier etal.-. Methods 
in Enzymology 1S5: bD-fiT (Academic Press lllO)^ and the like- 

Moreover-! pGEX vectors may be used to express foreign 
polypeptides as fusion proteins with glutathione S-transf erase 
(GST)- In general-, such fusion proteins are soluble and easily 
can be purified from lysed cells by adsorption to glutathione- 
agarose beads followed by elution in the presence of free 
glutathione- The pGEX vectors are designed to include thrombin or 
factor Xa protease .cleavage .sites so that the cloned target gene 
protein can be released from the GST moiety. 

In a one embodiments full length cDNA sequences are appended 
with in-frame BamHI sites at the amino terminus and EcoRl sites at 
the carboxyl terminus using standard PCR methodologies (Innis et 
al-n supra) and ligated into the pGEX-2TK vector (Pharmacia-i 

Uppsala-i Sweden)- The resulting cDNA construct contains a kinase 
recognition site at the amino terminus f or radioactive labeling 
and glutathione S-transf erase sequences at the carboxyl terminus 
for affinity purification (Nilsson-. et al. nfiS-, EMBO J. M • 
Zabeau and Stanley-. Mfl2i EMBO J. 1:1217- 

Eukaryotic Expression 

Various mammalian cell culture systems can also be employed 
to express recombinant protein- Examples of mammalian expression 
systems include the COS-7 lines of monkey kidney fibroblasts-. 
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described by Gluzman-. Cell 23:175 < ITfil ) and other cell lines 

capable of expressing a compatible vector-! for example-, the C127-. 
3T3i CHO-i HeLa and BHK cell lines- Mammalian expression vectors 
will comprise an origin of replication-i a suitable promoter and 
enhancer-i and also any necessary ribosome binding sitesi 
polyadenylation site-* splice donor and acceptor sites-, 
transcriptional termination sequences-i and 5 r flanking 
nontranscribed sequences- DNA sequences derived from the SVMD 
viral genome-, for example-. SV^D origin-, early promoter-, enhancer^ 
splice-, and polyadenylation sites may be used to provide the 
required nontranscribed genetic elements- 
Mammalian promoters include CHV immediate early-. HSV 
thymidine kinase-, early and late SVM0-. LTRs from retrovirus-, and 
mouse metallothionein-I- Exemplary mammalian vectors include 
pWLneo-, pSVEcat-, p&GMMi pXTl-. pSG (Stratagene) pSVK3-. pBPV-. pNSG-. 
and pSVL (Pharmacia). Selectable markers include CAT 

(chloramphenicol transferase) • 

In mammalian host cells-i a number of viral-based expression 
systems may be utilized- In cases where an adenovirus is used as 
an expression vectori the coding sequence of interest may be 
ligated to an adenovirus transcription/translation control 
complex -> e.g. the late promoter and tripartite leader sequence- 
This chimeric gene may then be inserted in the adenovirus genome 
by in vitro or in vivo recombination* Insertion in a non- 
essential region of the viral genome (e.gr. -i region El or E3) will 
result in a recombinant virus that is viable and capable of 
expressing a target protein in infected hosts- CE.gr. i See Logan 
et al.n nfllvProc. Natl. Acad. Sci . USA 81 : BbSS-BbS^) - 

In one embodiment-! cDNA sequences encoding the full-length 
open reading frames are ligated into pCMVG replacing the 13- 
galactosidase gene such that cDNA expression is driven by the CriV 
promoter (Alam-i l^d Anal. Biochem. Iflfl: SMS-BSM^ MacGregor et 
al.-» nfl^-i Nucl. Acids Res. 17: B3bSn Norton et al. nflS-i Mol. 
Cell. Biol. 5: Bfll). 

In addition-, a host cell strain may be chosen which modulates 
the expression of the inserted sequences-, or modifies and 
processes the gene product in the specific fashion desired- Such 
modifications (e.gr.-» glycosylation) and processing (e.g.-, 
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cleavage) of protein products may be important for the function of 
the protein- Different host cells have characteristic and 
specific mechanisms for the post-translational processing and 
modification of proteins- 
5 Appropriate cell lines or host systems can be chosen to 

ensure the correct modification and processing of the foreign 
protein expressed- To this endi eukaryotic host cells which 
possess the cellular machinery for proper processing of the 
primary transcript-, glycosylation-, and phosphorylation of the gene 
10 product may be used. Such mammalian host cells include but are 
not limited to CH0-, VER0-, BHK-i HeLa-, COS-, MDCK-, 2^3-, 3T3-. UI3a-, 
etc 

For long-termi high-yield production of recombinant proteins 
in eukaryotic cells-, stable expression is preferred- Rather than 

15 using expression vectors which contain viral origins of 
replication-, host- cells can be transformed with DNA controlled by 
appropriate expression control elements (e.gr. promoter! enhancer-i 
sequences-, transcription terminators-, polyadenylation sites-, 
etc.)-, and a selectable marker. 

20 Following the introduction of the foreign DNA-, engineered 

cells may be allowed to grow for 1-2 days in an enriched media-, 
and then are switched to a selective media- The selectable marker 
in the recombinant plasmid confers resistance to the selection and 
allows cells to stably integrate the plasmid into their 

25 chromosomes and grow to form foci which in turn can be cloned and 
expanded into cell lines- This method may advantageously be used 
to engineer cell lines which express the target protein- Such 
engineered cell lines may be particularly useful in screening and 
evaluation of compounds that affect the endogenous activity of the 

30 protein • 

A number of selection systems may be used-, including but not 
limited to the herpes simplex virus thymidine kinase (Uigler-, et 
al.i Cell 11:223 (1^77))-, hypoxanthine-guanine 

phosphoribosyltransf erase (Szybalska et al . i Proc. Natl. Acad. 
35 Sci. USA 1fi:ED2Li (1^2) )-» and adenine phosphoribosyltransf erase 
(Lowy-i et al.i Cell 22:fil7 ClIBB)) genes can be employed in tk"-, 
hgprt' or aprt" cells-. respectively. Also-, antimetabolite 

resistance can be used as the basis of selection for dhfr-, which 
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confers resistance to methotrexate (kligler-. et al.i Proc. Natl. 
Acad, Sci. USA 77:35b7 (nfiD))i O'Hare-, et al . llflli Proc. Natl. 
Acad. Sci. USA 7fi:1527) : . gptn which confers resistance to 
mycophenolic acid (Mulligan et al.i Proc. Natl. Acad. Sci. USA 
5 ?a:ED75 (nfll))i neo-i which confers resistance to the 
aminoglycoside G-mfl (Colberre-Garapin-, et al.i IHfll-. J. Mol. 
Biol. ISO:!); and hydro-i which confers resistance to hygromycin 
(Santerrei et al.-» ITfiMi Gene 30:1M7) genes. 

An alternative fusion protein system allows for the ready 

10 purification of non-denatured fusion proteins expressed in human 
cell lines (Janknecht-. et al.i Proc. Natl. Acad. Sci. USA fifl: 
fl^TE-ftTPb (mi)). In this system-, the gene of interest is 
subcloned into a vaccinia-based plasmid such that the gene's topen 
reading frame is translationally fused to an amino-terminal tag 

15 consisting of six histidine residues- Extracts from cells 
infected with recombinant vaccinia virus are loaded onto Ni 2+ 
nitriloacetic acid-agarose columns and histidine-tagged proteins 
are selectively eluted with imidazole-containing buffers. 

In an insect system-. Autographa calif ornica nuclear 

20 polyhedrosis virus (AcNPV) is used as a vector to express foreign 
genes- The virus grows in Spodoptera frugiperda cells- The 
target coding sequence may be cloned individually into non- 
essential regions (for example the polyhedrin gene) of the virus 
and placed under control of an AcNPV promoter (for example the 

25 polyhedrin promoter). Successful insertion of a target gene 
coding s&quence will result in inactivation of the polyhedrin gene 
and production of non-occluded recombinant virus (i-e--i virus 
lacking the proteinaceous coat coded for by the polyhedrin gene) . 
These recombinant viruses are then used to infect Spodfoptejra 

30 frugiperda cells in which the inserted gene is expressed. (.E.g.-* 
see Smith et al-r nfi3i J". Virol. Mb: SfiMn Smith-. U.S. Patent No- 
M-.215-.051) - 

Uhile the present proteins can be expressed in recombinant 
systems-, as described above-, cell-free translation systems can 
35 also be employed to produce such proteins using RNAs derived from 
the DNA constructs of the present invention- 
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Purification of Recombinant Proteins 

Recombinant proteins produced may be isolated by host cell 
lysis* This may be followed by one or more salting-out^ aqueous 
ion exchange or size exclusion chromatography steps- Finally-i 
5 high performance liquid chromatography (HPLC) can be employed for 
final purification steps- Microbial cells employed in expression 
of proteins can be disrupted by any convenient methodi including 
freeze-thaw cycling-i sonication-i mechanical disruption! or use of 
cell lysing agentsi like lysozyme and chelators- 

10 If inclusion bodies are formed in bacterial systemsi they may 

be extracted from cell pellets using-, for example-i detergents-i 
reducing agents-i salts-i urea-i guanidinium chloride and extremes of 
pH (e.g. <H or >1D). If denaturation occursi protein refolding 
steps (e.gr.i dialysis) can be usedi as necessary-i in completing 

15 configuration of the mature protein- Tf disulfide bridges are 
present in the native protein-i they may be reoxidized using known 
methods. 

By way of specific non-limiting examples the recombinant 
bacterial cells-i for example E. coli, are grown in any of a number 

20 of suitable media-i for example LB-i and the expression of the 
recombinant protein induced by adding IPTG ie.g.t lac operator- 
promoter) to the media or switching incubation to a higher 
temperature (e.gr.n X cl as? ) - After culturing the bacteria for a 
further period of between E and EM hours-» the cells are collected 

25 by centrif ugation and washed to remove residual media- The 
bacterial cells are then lysedn for example-i by disruption in a 
cell homogenizer and centrifuged to separate the cell membranes 
from the soluble cell components- If the protein aggregates into 
inclusion bodiesn this centrif ugation can be performed under 

30 conditions whereby the dense inclusion bodies are selectively 
enriched by incorporation of sugars such as sucrose into the 
buffer and centrif ugation at a selective speed- The inclusion 
bodies can then be washed in any of several solutions to remove 
some of the contaminating host proteinsn then solubilized in 

35 solutions containing high concentrations of urea (e.gr. fifl) of 
chaotropic agents such as guanidinium hydrochloride in the 
presence of reducing agents such as G-mercaptoethanol or DTT 
(dithiothreitol ) - 
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At this stage it may be advantageous to incubate the protein 
for several hours under conditions suitable for the protein to 
undergo a refolding process into a conformation which more closely 
resembles that of the native protein- Such conditions generally 
5 include low protein concentrations less than SDD |ig/ml)i low 
levels of reducing agent-i concentrations of urea less than 2 M and 
often the presence of reagents such as a mixture of reduced and 
oxidized glutathione which facilitate the interchange of 
disulphide bonds within the protein molecule- The refolding 

10 process can be monitoredi for example-) by SDS-PAGE or with 
antibodies which are specific for the native molecule. Following 
refolding-, the protein can then be purified further and separated 
from the refolding mixture by chromatography on any of several 
supports including ion exchange resinsi gel permeation resins or 

15 on a variety of affinity columns- 

Labeling Proteins 

When used as a component in assay systems such as those 
described! below-i the target protein may be labeledn either 
directly or indirectlyi to facilitate detection of the present 

20 res-like molecules either in vitro or in vivo- Any of a variety 
of suitable labeling systems may be used including but not limited 
to radioisotopes such as 12S I; enzyme labeling systems that 
"generate a detectable color imetric signal or light when exposed to 
substrates and fluorescent labels- 

25 Where recombinant DNA technology is used for protein 

production the-i it may be advantageous to engineer fusion proteins 
that can facilitate labeling-i immobilization and/or detection- 
These fusion proteins may-i for example^ add amino acids which 
facilitate further chemical modification- They also may add a 

30 functional moietyn such as an enzyme-i which directly facilitates 
detection. 



-582- 



WO 01/98454 
TRANSGENIC ANIMALS 



PCT/IB01/02050 



The invention further contemplates animal models for studying 
the function of the present molecules and for overproducing the 
protein products- The disclosed DNA sequences may be used in 
conjunction with techniques for producing transgenic # animals that 
are well known to those of skill in the art* 

To prepare transgenic animalsi target gene sequences may for 
example be introduced into-i and overexpressed in-i the genome of 
the animal of interest-! ori if endogenous target gene sequences 
are presenti they may either be overexpressed ori al ternat ively n 
be disrupted in order to underexpress or inactivate target gene 
expression-! such as described for the disruption of apoE in mice 
(Plum et al.i Cell 71: 3i*3-353 (l^E))- 

In order to overexpress a target gene sequence-* the coding 
portion of the target gene sequence may be ligated to a regulatory 
sequence which is capable of driving gene expression in the animal 
and cell type of interest- Such regulatory regions will be well 
known to those of skill in the art-i and may be utilized in the 
absence of undue experimentation- 

For underexpression of an endogenous target gene sequence-i 
such a sequence may be isolated and engineered such that when 
reintroduced into the genome of the animal of interest! the 
endogenous target gene alleles wil 1 be inactivated - Preferably i 
the engineered target gene sequence is introduced via gene 
targeting such that the endogenous target sequence is disrupted 
upon integration of the engineered target gene sequence into the 
animal's genome- Animals of any species-i including-i but not 
limited to-» micei rats-i rabbits-i guinea pigs-i pigsi micro-pigs^ 
goats-i and non-human primates-i e.g. i baboons-i monkeysi and 
chimpanzees may be used to generate cardiovascular disease animal 
models- Goatsi cows and sheep are particularly preferred for 
producing protein in vivo* 

Any technique known in the art may be used to introduce a 
target gene transgene into animals to produce the founder lines of 
transgenic animals- Such techniques include-i but are not limited 
to pronuclear microinjection (Hoppe et al.i U-S- Pat- No. 
t 4Tfl73ini (nflT))^. retrovirus mediated gene transfer into germ 
lines (Van der Putten et al .i Proc. Natl. Acad. Sci . , USA flH:blMfl- 
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blSB (ITflS))^ gene targeting in embryonic stem cells (Thompson et 
al.i Cell 5b:313-321 (l^BT) ) =■ electroporat ion of embryos (Lo-. Mol . 
Cell. Biol. 3:lflQ3-lfll4 (1^63))^ and sperm-mediated gene transfer 
(Lavitrano et al.^.Cell 57:717-723 (na^))* etc. For a review of 
5 such techniques-i see Gordonv Transgenic Animals-i Xntl. Rev. Cytol. 
115:171-22^ (MAT)- 

The present invention provides for transgenic animals that 
carry the transgene in all their cells-! as well as animals which 
carry the transgene in some-, but not all their cells* i.e.-i mosaic 
10 animals. The transgene may be integrated as a single transgene or 
in concatamersi e.g.-* head-to-head tandems or head-to-tail 
tandems. The transgene may also be selectively introduced into 
and activated in a particular cell type by following* for example-! 
the teaching of Lasko et al- (Lasko et al.* Proc. Natl. Acad. Scl . 
15 USA fiT :3232-b23b (1^2)). The regulatory sequences required for 
such a cell-type specific activation will depend upon the 
particular cell type of interest-! and will be apparent to those of 
skill in the art- When it is desired that the target gene be 
integrated into the chromosomal site of the endogenous target 
20 gene-i gene targeting is preferred. Briefly-i when such a technique 
is to be utilized-i vectors containing some nucleotide sequences 
homologous to the endogenous target gene of interest are designed 
... for the purpose ofL integrating with 
chromosomal sequences-i into and disrupting the function of the 
25 nucleotide sequence of the endogenous target gene. 

The transgene may also be selectively introduced into a 
particular cell type* thus inactivating the endogenous gene of 
interest in only that cell type? by following* for example* the 
teaching of Gu et al. Science 2b5: 103-lDb (1^4)). The 
30 regulatory sequences required for such a cell-type specific 
inactivation will depend upon the particular cell type of 
interest-i arid will be apparent to those of skill in the art- 

Once transgenic animals have been generated-! the expression 
of the recombinant target gene and protein may be assayed 
35 utilizing standard techniques- Initial screening may be 

accomplished by Southern blot analysis or PCR techniques to 
analyze animal tissues to assay whether integration of the 
transgene has taken place- The level of mRNA expression of the 
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transgene in the tissues of the transgenic animals may also be 
assessed using techniques which include but are not limited to 
Northern blot analysis of tissue samples obtained from the animali 
in situ hybridization analysis-i and RT-PCR* Samples of target 
5 gene-expressing tissue-i may also be evaluated immunocytochemically 
using antibodies specific for the target gene transgene gene 
product of interest* 

The transgenic animals that express target gene mRNA or 
target gene transgene peptide (detected immunocytochemically t 

10 using antibodies directed against the target gene product's 
epitopes) at easily detectable levels should then be further 
evaluated to identify those animals which display characteristic 
increased susceptibility to carcinogenesis- Additionally-! 
specific cell types within the transgenic animals may be analyzed 

15 and assayed in vitro for cellular phenotypes characteristic of 
mutant phenotype- 

Once target gene transgenic founder animals are produced-* 
they may be bred-i inbredi outbred-i or crossbred to produce 
colonies of the particular animal- Examples of such breeding 

20 strategies include but are not limited to: outbreeding of founder 
animals with more than one integration site in order to establish 
separate linesi inbreeding of separate lines in order to produce 
compound target gene transgenics that express the target gene 
transgene "of interest at higher levels because of the effects of 

25 additive expression of each target gene transgene^ crossing of 
heterozygous transgenic animals to produce animals homozygous for 
a given integration site in order both to augment expression and 
eliminate the possible need for screening of animals by DNA 
analysis'^ crossing of separate homozygous lines to produce 

30 compound heterozygous or homozygous lines^ breeding animals to 
different inbred genetic backgrounds so as to examine effects of 
modifying alleles on expression of the target gene transgene and 
the possible development of carcinogenesis. One such approach is 
to cross the target gene transgenic founder animals with a wild 

35 type strain to produce an Fl generation that exhibits increased 
susceptibility to carcinogenesis- The Fl generation may then be 
inbred in order to develop a homozygous line-i if it is found that 
homozygous target gene transgenic animals are viable- 
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Methods of generating "knockout" mice using homologous 
recombination in embryonic stem cells are well known in the art- 
Suitable methods are described-, for example-i in Hansour et al.-» 
Nature-, 33L:3Mfl (nfifl)i Zijlstra et al.i Naturei 3M2:H35 (ITfi^) 
and 3MM:?H5 (l^D)* and Hasty et al.-, Nature-, 35D:SM3 CLIll). 
This genomic DNA can be obtained by conventional methods using the 
cDNA sequence as a probe in a commercially-available genomic DNA 
library. 

Briefly-, a genomic fragment is cleaved with a restriction 
endonuclease and a heterologous cassette containing a neomycin- 
resistance gene is inserted at the cleavage site- A suitable 
cassette is the GTI-II neo cassette described by Lufkin et al.-. 
Cell bb:llDS (WD- The modified genomic fragment is cloned into 
a suitable targeting vector that is introduced into murine 
embryonic stem cells by electroporation - Cells that have 
undergone homologous recombination (and hence disruption of the 
gene) are selected by resistance to Gmfl-i and used to generate 
chimeric mice using well known methods- See Lufkin et ai.i supra- 
Traditional breeding methods then can be used to generate mice 
that are homozygous for the disrupted gene- 

The phenotype of mice that are homozygous for the mutation 
then can be studied to provide insights into the role of the 
protein in-i for examplei carcinogenesis. These J?A C ^. also can be 
used as models for developing new treatments for cancers- If this 
mutation is lethal in homozygous mice (for example during 
embryogenesis) heterozygous micei which express only half the 
amount of the protein can also be studied- 

GENE THERAPY APPLICATIONS 

When mutations in the inventive protein-, or in the elements 
controlling expression of that protein-, are found to be associated 
with a malignant phenotype-, control of cellular proliferation can 
be restored by gene therapy methods. For example^ overexpression 
of the protein can be counteracted by concurrent expression of an 
antisense molecule that binds to and inhibits expression of the 
mRNA encoding the protein- Alternatively-, overexpression can be 
inhibited in an analogous manner using a ribozyme that cleaves the 
mRNA- In another embodiment-, where expression of a mutated 
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protein induces the malignant phenotype-i concomitant expression of 
the non-mutated molecule via introduction of an exogenous gene may 
be used. Methods of using antisense and ribozyme technology to 
control gene expression! or of gene therapy methods for expression 
of an exogenous gene in this manner are well known in the art- 

Each of these methods requires a system for introducing a 
vector into the cells containing the mutated gene. The vector 
encodes either an antisense or ribozyme transcript of the 
inventive protein- The construction of a suitable vector can be 
achieved by any of the methods well-known in the art for the 
insertion of exogenous DNA into a vector- See, e.gr.i Sambrook et 
al.-i Molecular Cloning (Cold Spring Harbor Press Ed ed. llfl^)-! 
which is incorporated herein by reference- In addition-i the prior 
art teaches various methods of introducing exogenous genes into 
cells in vivo- See Rosenberg et al.-i Science 242' 1575-157S (nflB) 
and Uolff et al.i PNAS fit ^Oll-^OIM (nflT)«i which are incorporated 
herein by reference- The routes of delivery include systemic 
administration and administration in situ- Well-known techniques 
include systemic administration with cationic liposomes-* and 
administration in situ with viral vectors. Any one of the gene 
delivery methodologies described in the prior art is suitable for 
the introduction of a recombinant vector containing an inventive 
gene according to the invention into a MTX-resistant-i transport- 
deficient cancer cell- A listing of present-day vectors suitable 
for the purpose of this invention is set forth in Hodgson-i 
Bio/Technology 13' 2E2 (1^5) which is incorporated by reference- 
For examplei 1 iposome-mediated gene transfer is a suitable 
method for the introduction of a recombinant vector containing an 
inventive gene according to the invention into a MTX-resistant i 
transport-deficient cancer cell- The use of a cationic liposome^, 
such as DC-Chol/DOPE liposome-i has been widely documented as an 
appropriate vehicle to deliver DNA to a wide range of tissues 
through intravenous injection of DNA/cationic liposome complexes. 
See Caplen et al.i Nature Med. l:3^-Mb (l^TS) and Zhu et al.i 
Science 2ffi:2D c l-511 (1^3) which are herein incorporated by 
reference. Liposomes transfer genes to the target cells by fusing 
with the plasma membrane. The entry process is relatively 
efficient-! but once inside the celli the liposome-DNA complex has 
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no inherent mechanism to deliver the DNA to the nucleus. As such-, 
the most of the lipid and DNA gets shunted to cytoplasmic waste 
systems and destroyed- The obvious advantage of liposomes as a 
gene therapy vector is that liposomes contain no proteins^ which 
thus minimizes the potential of host immune responses. 

As another example-i viral vector-mediated gene transfer is 
also a suitable method for the introduction of the vector into a 
target cell. Appropriate viral vectors include adenovirus vectors 
and adeno-associated virus vectorsn retrovirus vectors and 
herpesvirus vectors. 

Adenoviruses are linear-, double stranded DNA viruses 
complexed with core proteins and surrounded by capsid proteins. 
The common serotypes E and Si which are not associated with any 
human malignancies-! are typically the base vectors- By deleting 
parts of the virus genome and inserting the desired gene under the 
control of a constitutive viral promoter-, the virus becomes a 
replication deficient vector capable of transferring the exogenous 
DNA . to dif f erentiated-i non-proliferating cells. To enter cells-, 
the adenovirus fibre interacts with specific receptors on the cell 
surfaces and the adenovirus surface proteins interact with the 
cell surface integrins. The virus penton-cell integrin 

interaction provides the signal that brings the exogenous gene- 
containing virus into a cytoplasmic endosome- The adenovirus 
breaks out of the endosome and moves to the nucleusi the viral 
capsid falls aparti and the exogenous DNA enters the cell nucleus 
where it functions! in an epichromosomal fashion-, to express the 
exogenous gene. Detailed discussions of the use of adenoviral 
vectors for gene therapy can be found in Berkner-. Blotechniques 
ffsblh-bS^ (l^flfi) and Trapnell-, Advanced Drug Delivery Rev. 22:185- 
m (mB)-! which are herein incorporated by reference- 
Adeno virus-derived vectors -, particularly non-repl icat i ve 
adenovirus vectors -i are characterized by their ability to 
accommodate exogenous DNA of 7.5 kB-, relative stability-, wide host 
range-i low pathogenicity in man-, and high titers (ID 4 to ID 5 
plaque forming units per cell). See Stratf ord-Perr icaudet et al.-i 
PNAS 89:2S&1 (1^2). 

Adeno-associated virus (AAV) vectors also can be used for the 
present invention- AAV is a linear single-stranded DNA parvovirus 
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that is endogenous to many mammalian species- AAV has a broad 
host range despite the limitation that AAV is a defective 
parvovirus which is dependent totally on either adenovirus or 
herpesvirus for its reproduction in vivo. The use of AAV as a 
vector for the introduction into target cells of exogenous DN A is 
well-known in the art. See, e.gr.-i Lebkowski et al.i Mole. & Cell. 
Biol. 0:3^00 (ITfifi)-i which is incorporated herein by reference. 
In these vectors^ the capsid gene of AAV is replaced by a desired 
DNA fragmenti and transcomplementation of the deleted capsid 
function is used to create a recombinant virus stock. Upon 
infection the recombinant virus uncoats in the nucleus and 
integrates into the host genome- 

Another suitable virus-based gene delivery mechanism is 
retroviral vector-mediated gene transfer- In generals retroviral 
vectors are well-known in the art. See Breakfield et ai.i Mole. 
Neu.ro. Biol. 2:33=1 (na?) and Shih et al . in Vaccines AS: 17? 
(Cold Spring Harbor Press nfl5)- A variety of retroviral vectors 
and retroviral vector-producing cell lines can be used for the 
present invention. Appropriate retroviral vectors include Moloney 
Murine Leukemia Virusn spleen necrosis virusi and vectors derived 
from retroviruses such as Rous Sarcoma Virus-» Harvey Sarcoma 
Virusi avian leukosis virus-i human immunodeficiency virusi 
myeloproliferative sarcoma virusi and mammary tumor virus- These 
vectors include replication-competent and replication-defective 
retroviral vectors- In addition-, amphotropic and xenotropic 
retroviral vectors can be used. In carrying out the invention-* 
retroviral vectors can be introduced to a tumor directly or in the 
form of free retroviral vector producing-cell lines- Suitable 
producer cells include f ibroblasts-i neuronsi glial cells-i 
keratinocytes-i hepatocytes -i connective tissue cells-, ependymal 
cells-, chromaffin cells- See Wolff et al.i PNAS 84:33m* (na^). 

Retroviral vectors generally are constructed such that the 
majority of its structural genes are deleted or replaced by 
exogenous DNA of interest-i and such that the likelihood is reduced 
that viral proteins will be expressed- See Bender et al.n J. 
Virol. bl:lb3T (1^67) and Armento et al.-. J. Virol. 52:1b 1 *? 
(I^fl7)v which are herein incorporated by reference- To facilitate 
expression of the antisense or ribozyme moleculei of the inventive 
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protein^ a retroviral vector employed in the present invention 
must integrate into the genome of the host cell genome-i an event 
which occurs only in mi totically active cells- The necessity for 
host cell replication effectively limits retroviral gene 
5 expression to tumor cells-i which are highly replicative-i and to a 
few normal tissues- The normal tissue cells theoretically most 
likely to be transduced by a retroviral vector-i therefore! are the 
endothelial cells that line the blood vessels that supply blood to 
the tumor- In addition-, it is also possible that a retroviral 
10 vector would integrate into white blood cells both in the tumor or 
in the blood circulating through the tumor. 

The spread of retroviral vector to normal tissues-* however-* 
is limited- The local administration to a tumor of a retroviral 
vector or retroviral vector producing cells will restrict vector 
15 propagation to the local region of the tumori minimizing 
transductions integration -i expression and subsequent cytotoxic 
effect on surrounding cells that are mitotically active- 

Both replicatively deficient and replicati vely competent 
retroviral vectors can be used in the invention^ subject to their 
20 respective advantages and disadvantages- For instance-i for tumors 
that have spread regionally! such as lung cancersi the direct 
injection of cell lines that produce replication-deficient vectors 
may not deliver the vector to a large enough area to completely 
eradicate the tumor-i since the vector will be released only form 
25 the original producer cells and their progeny-, and diffusion is 
limited- Similar constraints apply to the application of 
replication deficient vectors to tumors that grow slowly-i such as 
human breast cancers which typically have doubling times of 3D 
days versus the EM hours common among human gliomas- The much 
30 shortened survival-time of the producer cells-, probably no more 
than 7-m days in the absence of immunosuppression-! limits to only 
a portion of their replicative cycle the exposure of the tumor 
cells to the retroviral vector- 

The use of replication-defective retroviruses for treating 
35 tumors requires producer cells and is limited because each 
replication-defective retrovirus particle can enter only a single 
cell and cannot productively infect others thereafter. Because 
these replication-defective retroviruses cannot spread to other 
tumor cells-i they would be unable to completely penetrate a deep-i 
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mul tilayered tumor in vivo- See flarkert et al., Neurosurg. 77: 
5T0 (1^2). The injection of replication-competent retroviral 
vector particles or a cell line that produces a replication- 
competent retroviral vector virus may prove to be a more effective 
therapeutic because a replication competent retroviral vector will 
establish a productive infection that will transduce cells as long 
as it persists. Moreover-i replicatively competent retroviral 
vectors may follow the tumor as it metastasizes-! carried along and 
propagated by transduced tumor cells- The risks for complications 
are greater-i with replicatively competent vectorsn however. Such 
vectors may pose a greater risk then replicatively deficient 
vectors of transducing normal tissues-, for instance- The risks of 
undesired vector propagation for each type of cancer and affected 
body area can be weighed against the advantages in the situation 
of replicatively competent verses replicatively deficient 
retroviral vector to determine an optimum treatment- 

Both amphotropic and xenotropic retroviral vectors may be 
used in the invention- Amphotropic viruses have a very broad host 
range that includes most or all mammalian cellsi as is well known 
to the art. Xenotropic viruses can infect all mammalian cell 
cells except mouse cells- Thusi amphotropic and xenotropic 
retroviruses from many speciesi including cows? sheepi pigs-i dpgsi 
cats«i ratsi and micei inter alia can be used to provide retroviral 
vectors in accordance with the invention-i provided the vectors can 
transfer genes into proliferating human cells in vivo. 

Clinical trials employing retroviral vector therapy treatment 
of cancer have been approved in the United States- See Culveri 
Clin. Chem. 40' 510 Cl 1 ^ 1 *)- Retroviral vector-containing cells 
have been implanted into brain tumors growing in human patients- 
See Oldfield et al., Hum. Gene Ther. 4i 31 (1^3). These 
retroviral vectors carried the HSV-1 thymidine kinase (HSV-tk) 
gene into the surrounding brain tumor cells-i which conferred 
sensitivity of the tumor cells to the antiviral drug ganciclovir- 
Some of the limitations of current retroviral based cancer 
therapyn as described by Oldfield are: (1) the low titer of virus 
producedn (B) virus spread is limited to the region surrounding 
the producer cell implant-* (3) possible immune response to the 
producer cell line-i (4) possible insertional mutagenesis and 
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transformation of retroviral infected cells-i (5) only a single 
treatment regimen of pro-drug-i ganciclovir is possible because 
the "suicide" product kills retrovirally infected cells and 
producer cells and (b) the bystander effect is limited to cells in 
5 direct contact with retrovirally transformed cells. See Bi et 
al.i Human Gene Therapy 4 m - 755 (1^3) . 

Yet another suitable virus-based gene delivery mechanism is 
herpesvirus vector-mediated gene transfer. While much less is 
known about the use of herpesvirus vectors! replication-competent 
10 HSV-1 viral vectors have been described in the context of 
antitumor therapy . See Martuza et al . , Science 252: fl5 t 4 (11 ■)!)-, 
which is incorporated herein by reference. 

DIAGNOSTIC METHODS 

The present invention also contemplates-i for certain 

15 molecules described below-i methods for diagnosis of human disease. 
In particular! patients can be screened for the occurrence of 
cancersi or likelihood of occurrence of cancers-i associated with 
mutations in the encoded protein. DNA from tumor tissue obtained 
from patients suffering from cancer can be isolated, and the gene 

20 encoding the protein can be sequenced- By examining a number of 
patients in this manner! mutations in the gene that are associated 
with a malignant cellular phenotype can be identified. In 
addition! correlation of the nature of the observed mutations with 
subsequent observed clinical outcomes allows development of 

25 prognostic model for the predicted outcome in a particular 
patient- 
Screening for mutations conveniently can be carried out at 
the DNA level by use of PCRi although the skilled artisan will be 
aware that many other well known methods are available for the 

30 screening- PCR primers can be selected that flank known mutation 
sitesi and the PCR products can be sequenced to detect the 
occurrence of the mutation- Alternatively! the 3 r residue of one 
PCR primer can be selected to be a match only for the residue 
found in the unmutated gene. If the gene is mutated! there will 

35 be a mismatch at the 3 F end of the primer! and primer extension 
cannot occur! and no PCR product will be obtained- Alternatively! 
primer mixtures can be used where the 3 f residue of one primer is 
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any nucleotide other than the nonmutated residue- Observation of 
a PCR product then indicates that a mutation has occurred- Other 
methods of using-i for example-i oligonucleotide probes to screen 
for mutations are describedn or examplei in U-S- Patent No-. 
M-iB71tA3Bi which is herein incorporated by reference in its 
entirety- 

Alternatively t antibodies can be generated that selectively 
bind either mutated or non-mutated protein. The antibodies then 
can be used to screen tissue samples for occurrence of mutations 
in a manner analogous to the DNA-based methods described supra- 

The diagnostic methods described above can be used not only 
for diagnosis and for prognosis of existing disease-t but may also 
be used to predict the likelihood of the future occurrence of 
disease- For examplei clinically healthy patients can be screened 
for mutations in the inventive molecule that correlate with later 
disease onset- Such mutations may be observed in the heterozygous 
state in healthy individuals- In such cases a single mutation 
event can effectively disable proper functioning of the gene and 
induce a transformed or malignant phenotype- This screening also 
may be carried out prenatally or neonatally- 

DNA molecules according to the invention also are well suited 
for use in so-called "gene chip" diagnostic applications- Such 
applications have been developed by-i Inter alia.-* Synteni and 
Affymetrix- Briefly-i all or part of the DNA molecules of the 
invention can be used either as a probe to screen a polynucleotide 
array on a "gene chip-i" or they may be immobilized on the chip 
itself and used to identify other polynucleotides via 
hybridization to the surface of the chip. In this manner-i for 
examplei related genes can be identified-i or expression patterns 
of the gene in various tissues can be simultaneously studied. 
Such gene chips have particular application for diagnosis of 
diseasei or in forensic analysis to detect the presence or absence 
of an analyte- Suitable chip technology is described for examplei 
in Uodicka et al.-i Nature Biotechnology-* (1^7) which is 

hereby incorporated by reference in its entirety! and references 
cited therein. 

PROTEIN- PROTEIN INTERACTIONS 
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Due to their similarity to certain known proteins-, it is 
anticipated that some of the inventive protein molecules will 
interact with another class of cellular proteins. This is 
particularly true of those molecule containing leucine zipper 
motifs. 

Any method suitable for detecting protein-protein 
interactions can be employed for identifying interacting targets- 
Among the traditional methods which can be employed are co- 
immunoprecipitation-. crosslinking and co-purification through 
gradients or chromatographic columns- Utilizing procedures such 
as these allows for the identification of GAP gene products- Once 
identified! a GAP protein can be used-, in conjunction with 
standard techniques-, to identify its corresponding pathway gene. 
For example-i at least a portion of the amino acid sequence of the 
pathway gene product can be ascertained -using techniques, well 
known to those of skill in the art-, such as via the Edman 
degradation technique (see-. e.g. -. Creighton-. nfi3-» PROTEINS: 
STRUCTURES AND MOLECULAR PRINCIPLES-. Iil-h*. Freeman & Co.-. N . Y . -. 
PP-3M-HT)- The amino acid sequence obtained can be used as a 
guide for the generation of oligonucleotide mixtures that can be 
used to screen for pathway gene sequences- Screening can be 
accomplished-. for examplei by standard hybridization or PCR 
techniques- Techniques for the generation of oligonucleotide 
mixtures and for screening are well-known- (See e.g.i A-usubeli 
supra-, and PCR PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS-, 
mo-. Innis et al.-» eds. Academic Press-. Inc.-. New York). 

Additionally-, methods can be employed which result in the 
simultaneous identification of interacting target genes. One 
method which detects protein interactions in vivo-, the two-hybrid 
system-, is described in detail for illustration purposes only and 
not by way of limitation. One version of this system has been 
described (Chien et al . -» Proc. Natl. Acad. Sci. USA-, fifl: TSTA-TSflE 
(1^1)) and is commercially available from Clontech (Palo Alton 
CA). 

Briefly-, utilizing such a systemi plasmids are constructed 
that encode two hybrid proteins: one consists of the DNA-binding 
domain of a transcription activator protein fused to a known 
protein-, in this case an inventive protein-, and the other contains 
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the activator ' protein r s activation domain fused to an unknown 
protein (a putative GAP-, for instance) that is encoded by a cDNA 
which has been recombined into this plasmid as part of a cDNA 
library. The plasmids are transformed into a strain of the yeast 
Saccharomyces cerevisiae that contains a reporter gene (e.gr., 
lacZ) whose regulatory region contains the transcription 
activator's binding sites- Either hybrid protein alone cannot 
activate transcription of the reporter genei the DNA-binding 
domain hybrid cannot because it does not provide activation 
function! and the activation domain hybrid cannot because it 
cannot localize to the activator's binding sites- Interaction of 
the two hybrid proteins reconstitutes the functional activator 
protein and results in expression of the reporter gene-, which is 
detected by an assay for the reporter gene product. 

The two-hybrid system or- related methodology can be used to 
screen activation domain libraries for proteins that interact with 
a known "bait" gene product- By way of example-, and not by way of 
limitation^ gene products known to be involved, in TH cell 
subpopulation-related disorders and/or differentiation-, 

maintenances and/or effector function of the subpopulat ions can be 
used as the bait gene products. Total genomic or cDNA sequences 
are fused to the DMA encoding on activation domain- This library 
and a plasmid encoding a hybrid of the bait gene product fused to 
the DNA-binding domain are cotransf ormed into a yeast reporter 
strain-, and the resulting transf ormants are screened for those 
that express the reporter gene- For example-i and not by way of 
limitation-i the bait gene can be cloned into a vector such that it 
is translationally fused to the DNA encoding the DNA-binding 
domain of the CALM protein- These colonies are purified and the 
library plasmids responsible for reporter gene expression are 
isolated. DNA sequencing is then used to identify the proteins 
encoded by the library plasmids- 

The present invention-, thus generally described-, will be 
understood more readily by reference to the following examples-, 
which are provided by way of illustration and are not intended to 
be limiting of the present invention- 
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The examples below are provided to illustrate the subject 
invention- These examples are provided by way of illustration and 
are not included for the purpose of limiting the invention- 

EXAMPLES 

5 EXAMPLE I: cDNA .Library Construction 

cDNA library plates and clones originated from five cDNA 
libraries that were constructed by directional cloning. These are 
available through the Resource Center (http://www.rzpd.de) of the 

10 German Genome Project- In particular the hfbrS (human fetal 
brain; RZPD number DKFZpSbM) and hfkd2 (human fetal kidney; 
DKFZpStb) libraries were generated using the Smart kit 
(Clontech)-i except that PCR was carried out with primers that 
contained uracil residues to permit directional cloning without 

15 restriction digestion and ligationi and were complementary with 
the pAMPl (Lif ©Technologies) cloning sites for directional 
cloning. The htes3 (human testes; DKFZpM3M)-i hutel (human uterus; 
DKFZpSflb) and hmcfl (human mammary carcinoma; DKFZp727) libraries 
are conventional (Gubleri U.n Hoffman-i B-J-n (nfl3)n A simple and 

20 very efficient method for generating cDNA libraries- Gene 25-» 
Sb3-BbT)i size-selected cDNA libraries- They are cloned into 
pSPORTl (Lif eTechnologies) via a NotI site which is introduced 
during reverse transcription -downstream of the oligo dT primer 
and a Sail site that is introduced by the ligation of a adapters- 

25 The human mammary carcinoma library was constructed from HCF7 
cells- 

In a similar fashionn the hamyE (human amygdala nucleus 
(inside the brain); RZPD number DKFZp7bl) and hmel2 (human 
melanoma; RZPD number DKFZp7bH) libraries have been generated 

30 using conventional approaches-i emplying a NotI -dT V primer for 
first strand synthesis (GAGCGGCCGC(T)1 C !V) . After second strand 
synthesisi Sail adapters were ligated to the blunted cDNA - Then 
the cDNA was cut with NotI to generate Sall-NotI compatible ends 
at .the 5* 1 and 3* 1 ends of the cDNAt respectively n to allow 

35 directional cloning. The cDNAs were then size selected on agarose 
gels in two dimensions and cloned into the pSPORTl plasmid vector 
which had been pre-cut with Sail and NotI (Lif eTechnologies ) . The 

-596- 



WO 01/98454 PCT/IBOi/02050 
DNA was. transformed into the DH1DB bacterial strain and single 
colonies were picked into 3flMwell microtiter plates from the non- 
amplified library. The human melanoma library was constructed 
from NeWo cellsn published by Kern-i H-A-n Helmbachn H--i Artuci 
5 M-n Karmanrii D-n Jurgovskyi K- and Schadendorf t D.'<m7> Human 
melanoma cell lines selected in vitro displaying various levels 
of drug resistance against cisplatini f otemust ine ■• vindesine or 
etoposide: modulation of proto-oncogene expression- Anticancer 
Res- l-7-i ^S^-HBTO- 

10 The cDNA sequences of this application were first identified 

among the sequences comprising various libraries. Technology has 
advanced considerably since the first cDNA libraries were made. 
Many small variations in both chemicals and machinery have been 
instituted over timen and these have improved both the efficiency 

15 and safety of the process. Although the cDNAs could be obtained 
using an older procedure-i the procedure presented in this 
application is exemplary of one currently being used by persons 
skilled in the art. For the purpose of providing an exemplary 
methods the mRNA isolation and cDNA library construction 

20 described here is for the MCF-? library CDKFZp7E7) from which the 
clones named DKFZphmcf l_xxyyxx were obtained. 

The human cell line MCF-? was grown in DMEM supplemented 
with 1Q£ fetal calf serum until confluency- 3 X 10 fl cells were 
harvested with a cell scraper in PBS • Cells were lysed in buffer 

25 containing 0-5 >. NP-MO to leave the nuclei intact- The debris was 
pelleted by centrif ligation at 15 000 x g for 10 minutes at M 
degrees Celsius- Proteins in the supernatant were degraded in 
presence of SDS and Proteinase K (30 minutes at 5b degrees 
Celsius). Precipitation of proteins was done in a 

30 Phenol/Chloroform extraction-. RNA was precipitated from the 
aqueous phase with Na-acetate and Ethanol- Polyadenylated 
messages were isolated using tfiagen Oligotex CtflAGEN-, Hilden 
Germany) - 

First strand cDNA synthesis was accomplished using an oligo 
35 (dT) primer which also contained an NotI restriction site- Second 
strand synthesis was performed using a combination of DNA 
polymerase !-> E. coll ligase and RNase Hi followed by the 
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addition of a Sail adaptor to the blunt ended cDNA • The Sail 
adaptedn double-stranded cDNA was then digested with NotI 
restriction enzyme-i and fractionated by size on an agarose gel- 
DNA of the appropriate size was cut from the gel and cast into a 
5 second gel in a ^D 0 angle- After electrophoresis in the second 

dimension! cDNA of the appropriate size was cut from the gel- The 
agarose block was broken down with help of gelase- The cDNA was 
purified, with help of two phenol extractions and an ethanol 
precipitation- The cDNA was ligated into Sall/NotI pre-digested 
10 pSportl vector (Lif eTechnologies) and transformed into DH10B 
bacteria - 

The libraries were arrayed into 3flM-well microtiter plates 
and spotted on high density nylon membranes for hybridization 
analysis- All libraries have been arrayed into 3AHwell 
15 microtiter plates and spotted on high density nylon membranes for 
hybridization analysis - 

The hamyS Library consists of 1S1 3fi l 4well plates comprising Mb l 4b l 4 
clones* The hmelS library consists of 72 3BMwell plates 
comprising clones- Filters and clones are available 

20 through the Resource Center of German Genome Project 

( http ' //www • RZPD • de ) • Whole library plates were distributed to 
the sequencing partners of the consortium for systematic 
— sequencing- 

25 EXAMPLE lis Sequencing of cDNA Clones 

All clones in the 3AM-well microtiter plates were sequenced 
from the 5 r end- Sequencing was done preferentially using dye 
terminator chemistry (ABD or Amersham) on ABI automated DNA 
sequencers (ABI 377-1 Applied Biosystems)i one partner used EflBL 
30 prototype instruments (Arakis) mainly with dye primer chemistry. 

The resulting expressed sequence tag (EST) sequences ( u rl 
ESTs 11 = sequenced from 5'-end) were analysed for: 

a) the lack of identical matches with known genes- 

For this-i the EST-sequence was blasted against the cDNA 
35" consortiums own database and after that against public databases 
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and (with BLASTn and BLASTx against EMBL/EMBLNEU and assembled 
ESTs-i please refer to EXAMPLE III: Bioinf ormatics analysis of 
full length cDNAs-i for description and parameter settings)- ESTs 
which were identical to known genes in more than 1DD bp-i with 
5 less than S mismatches-i were excluded from further analysis- 

b) the presence of an open reading frame 

Open reading frames (0RFs> were detected with an tool 
developed by Munich Information Center for Protein Sequences 
(MIPS) called ORF-map- ORP-map. visualises potential start and 
10 stop-codons- If an ORF without a stop codon was detected in a 
rl-ESTn the sequence was processed further- 

c) the presence of GC rich sequences 

A script developed by MIPS computed the GOcontent of the 
rl-sequence-i which should be >MQ'/.. Writing similar scripts is 
15 within the ordinary skill of one in bioinf ormatics - 

d) the lack of repeat structures 

Repeats such as Alu-i Line or CA-repeats were detected by 
blasting (BLASTn and BLASTx n please refer to EXAMPLE III: 
Bioinf ormatics analysis of full length cDNAs-i for description and 
20 parameter settings) - against a repeat-database compiled by MIPS - 

If a repeat was present within the rl-sequence-i the sequence were 
not processed further- 

Novel clones that met all criteria were identified to the 
sequencers! who then performed B^-end sequencing of these clones- 
25 The resulting 3* 1 ESTs ( w sl ESTs" = sequenced from 3*-end) were 
checked for 

a) the lack of matches with known genes in public databases** 
and sequences already generated by us. 

This was done by blasting against EMBL/EMBLNEU and assembled 
30 EST (BLASTn and BLASTx-. please refer to EXAMPLE III: 

Bioinf ormatics analysis of full length cDNAsn for description and 
parameter settings)- 
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b) the presence of polyadenylati on signals. 



Again only clones matching the selection criteria were 
chosen to be sequenced completely by the sequencers- Clones were 
selected after the following criteria: 

5 A very good ORF had at least one BLASTx match to other 

proteins- A "good ORF" should extend to the 3 r end and be longer 
than ~M0 codons- If the ORF started in the rl sequencei in front 
of the potential start codon-i there should not exist too many 
competing start codons in frame with the ORF start codon and the 

10 start should match the Kozak consensus ATG. If the EST sequence 
was to short to decide according to the potential ORFt and there 
were only a few or no start codons in the sequence the GC content 
of the Sequence should be greater than MO*. The rl sequences 
needed not contain ah polyA-tail at the 3 9 end- In addition-i the 

15 results of the blasting against the assembled human ESTs could 
help in questionable cases to decide whether to stop or to 
continue- A hit against these ESTs was an indication to go 
further- 
Clones passing the above-described screening were sequenced 

20 in full. Sequencing was done preferentially using dye terminator 
chemistry (ABD or Amersham) on ABI automated DNA sequencers (ABI 
377-1 Applied Biosystems ) one partner used EHBL prototype 
instruments (Arakis) mainly with dye primer chemistry- Primer 
walking (Strauss et alo llflb-. Specific-primer-directed DNA 

25 sequencing* Anal Biochem- ISH-i 353-3fc.D) was the preferred 

sequencing strategy because of the lower redundancy possible 
compared to random shotgun (Messing-i J-t Crea-i R--i Seeburg-i H-P- 
(MAI) A system for shotgun DNA sequencing- Nucleic Acids Res- T-i 
32-3*1) methods. Walking primers were generally designed using 

30 software (e.g. Haasi S-n Vingron^ II - Poustkan A--» Uiemann-i S. 
(mfl) Primer design in large-scale sequencing. Nucleic Acids 
Res- 2b-. 3DDb-3D12i Schwager-. C--. Niemann-. S.-, Ansorge-, W- (MTS) 
GeneSkipper: integrated software environment for DNA sequence 
assembly and alignment- HUGO Genome Digest 2-. fl-^) that permitted 

35 complete automation of this usually time consuming process and 
helped in the parallel processing of large numbers of clones. 
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EXAMPLE III: Bioinf ormatics analysis of full length cDNAs 

Each sequence obtained was compared on nucleotide level in a 
stepwise manner to sequences in EMBL/EMBLNEWi EMBL-EST-i EMBL-STS 
using the BLASTn algorithm. Basic Local Alignment Search Tool 
5 (BLAST-. Altschul S- F- (11^3) J Mol Evol 3b:5^0-30D; Altschul-. S- 
F - et al (l^D) J Hoi Biol 215:M03-1D) is used to search for 
local sequence alignments- BLAST produces alignments of both 
nucleotide (BLASTn) and amino acid sequences (BLASTp or BLASTx) 
to determine sequence similarity. BLAST is especially useful in 
10 determining exact matches or in identifying homologs-. because of 
the local nature of the alignments- While it is useful for 
matches which do not contain gapsi it is inappropriate for 
performing motif-style searching- The fundamental unit of BLAST 
algorithm output is the High-scoring Segment Pair (HSP). 

15 An HSP consists of two sequence fragments of arbitrary but 

equal lengths whose alignment is locally maximal and for which 
the alignment BLAST approach is to look threshold or cut off 
score set by the user- BLAST looks for HSPs between a query 
sequence and a database sequence! to evaluate the statistical 

20 significance of any matches foundi and to report only those 
matches which satisfy the user-selected threshold of 
significance ■ The parameter E establishes the statistically 
significant threshold for reporting database sequence matches- E 
is interpreted as the upper bound of the expected frequency of 

25 chance occurrence of an HSP (or set of HSPs) within the context 
of the entire database search- Any database sequence whose match 
satisfies E is reported in the program output- Parameter settings 
for the BLAST-operations (BLASTN E-DanNP-WashU) described were: 
EMBL-EMBLNEU: H S D V=S B=5 -filter seg; EMBL-EST = H=D E=le-10 

30 B=5DD V=500 -filter seg^ EMBL-STS : H=D V=5 B=S- 

Search against EflBL/EdBLNEliI was done to determine whether 
the cDNAs are already known-i and also to find out whether the 
cDNAs are encoded by genomic sequences already sequenced and 
published/submitted to these databases. 

35 Search against EMBL-EST was performed to get a first 

impression how abundant a particular cDNA would be and to get 
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information on tissue specificity (so-called "electronic 
Northern-Blot"i e.g. some of the cDNAs derived of the testis 
library show only hits to ESTs also derived of testis libraries). 

The cDNA-sequences were blasted against ENBL-STS to 
5 determine STS-sequence-match to the cDNA-> thus providing a 
mapping information to the new cDNA- 

\ 

The potential protein-sequences were generated automatically 
by a script searching for the longest open reading frame (ORF) in 
each of the three forward frames with a minimum length of 

10 codons- Nexti the automatically generated ORFs were translated 
into protein sequences- These protein sequences were searched 
against the non redundant protein data set of 
PIR/SwissProt/Trembel/Tremblnew (BLASTP 2 - DanilP-lilashUi parameter 
setting: v=7 B=7 H=D -filter seg) . If the script generated more 

15 than one ORFt one ORF was chosen manually by the annotater 
according to the degree of similarity to known proteinsi the 
location of the ORF in the cDNAi the length-i the amino acid 
composition and the content of Prosite-llotif s . 

Additionally there was a BLASTx (BLASTX 2 . DallllP-lilashU 
20 against non redundant protein database comprising 
PIR/SbllSSPROT/TREPlBL/TREMBLNEU i parameter-settings were: 

matrix/home/d^^ - V=S B=5 -filter 

seg) search to find potential frame shift in the complementary 
cds of the cDNAs and to identify unspliced or partly spliced 
25 cDNAs • The protein sequence was then transferred to the PEDANT 
system-i in order to generate additional information on the new 
proteins. PEDANT (Protein Extraction-i Description -i and ANalysis 
Tool-i Frishman-i D- & Mewes-i H • — U - (in?) PEDANTic, genome 
analysis. Trends in Genetics n 13t MlS-mb) is a platform 
30 developed at the Munich Information Center for Protein Sequences 
(MIPS-i Nunich-i Germany) t which incorporates practically all 
bioinf ormatics methods important for the functional and 
structural characterisation of protein sequences. Computational 
methods used by PEDANT are: 
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FASTA 

Very sensitive protein sequence database searches with 
estimates of statistical significance- Pearson UJ.R. (mD) Rapid 
and sensitive sequence comparison with FASTP and FASTA- Methods 
5 Enzymol- 1S3-, b3-Tfl- 

BLAST2 

Very sensitive protein sequence database searches with 
estimates of statistical significance- Altschul S-F-t Gish bl.-i 
Miller bl - -t flyers E-U-i and Lipman D-J- Basic local alignment 
10 search tool. Journal of Molecular Biology 21S-* MD3-10. 

PREDATOR 

High-accuracy secondary structure prediction from single and 
multiple sequences. Frishman-. D - and Argos-i P- (ITT?) 7SX 
accuracy in protein secondary structure prediction- Proteins-> B7t 
15 3BT-335- Frishman-i D . and Argosi P-CnU) Incorporation of long- 
distance interactions in a secondary structure prediction 
algorithm. Prot- Eng. S-i 133-mE- 

STRIDE 

Secondary structure assignment from atomic coordinates. 
20 Frishman-i D. and Argosi P-(mS) Knowledge-based secondary 
structure assignment. Proteins 23t Sbt-STT- 

CLUSTALW 

Multiple sequence alignment. Thompson-i J.D-i Higgins-i D-G- 
and Gibson-. T-J- (ITTM) CLUSTAL W: improving the sensitivity of 
25 progressive multiple sequence alignment through sequence 

weighting! positions-specific gap penalties and weight matrix 
choice. Nucleic Acids Research-, 22 : MbTB-HbBD - 

TMAP 

Transmembrane region prediction from multiply aligned 
30 sequences. Persson-i B- and Argos-i P.-'dlTM) Prediction of 

transmembrane segments in proteins utilising multiple sequence 
alignments. J. Mdl. Biol- S37i IflE-n?. 
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ALONE 



Transmembrane region prediction from single sequences- 
Klein-i P--i Kanehisa-i 11. i and DeLisi-i C- Prediction of protein 
function from sequence properties: A discriminant analysis of a 
5 database. Biochim- Biophys- Acta ?87-i 5El-5c?b (l c lfl l 4). Version 2 
by Dr. K- Nakai- 

SIGNALP 

Signal peptide prediction Nielsen-i H--i Engelbrecht-i J - 1 
Brunakn S-n and von Heijne-i G (1^7). Identification of 
10 prokaryotic and eukaryotic signal peptides and prediction of 
their cleavage sites- Protein Engineering IDi l-t- 

SEG 

Detection of low complexity regions in protein sequences. 
Woottoni J.C-t Federheni S- (1^3) Statistics of local complexity 
15 in amino acid sequences and sequence databases- Computers & 
Chemistry 1?-. 1^-1^3- 

COILS 

Detection of coiled coils. Lupasi A-i FT- Van Dyke-i and J. 
Stoc ki "Pre dic ting C oil ed Co i Is f r om Protein Sequences • " Science 

PROSEARCH 

Detection of PROSITE protein sequence patterns- Kolakowski 
L-F- Jr-n Leunissen-J-A.fi- t Smith- J. E- (l^S) ProSearch: fast 
searching of protein sequences with regular expression patterns 
25 related to protein structure and function. Biotechniques 13i 



Similarity searches against a database of ungapped blocks- 
J-C- Wallace and Henikoff S--, (mE) PATNAT: a searching and 
30 extraction program for sequencei pattern and block queries and 
databases-, CABIOS fi-. 2*H-254. Written by Bill Alford. 



BLIMPS 
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Hidden Markov model software - Sonnhammer E-L-L.-i Eddy 
S-R-t Durbin R- (ITT?) Pfam: A Comprehensive Database of Protein 
Families Based on Seed Alignments- Proteins Efii MD5-MSD- 

5 pi 

Perl script that returns the amino acid composition! molecular 
weights theoretical pli and expected extinction coefficient of an 
amino acid sequence- By Fred Lindberg. The parametei — settings 
were as follows: known3d: score > IDDt BLAST: E-value < ID* SCOP : 

10 <= 50 Alignments! E-Value < O-DDOIt signalp: Y=0-7 : i untersucht 

vom N-Terminus her: 50 aa; funcat: E-value < O-DDH BLOCKS : <= ID 
hits=i BLiriPS: threshold 11DQ..D; COILS: threshold 0-^5n SEG: 
threshold 20-Dn BLAST in report: E-value < O-ODH PIR-KU-. 
superf amilies-i EC-Nummern in report: E-value < D.DDDOli known3d 

15 in report: score > 120 

The results of PEDANT analysis together with the results of 
the similarity searches constitute the basis for the structural 
and functional annotation of the cDNAs and the encoded proteinsi 
as specified herein- 
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claim: 

1- An assemblage^ comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amyE_lEg7n amyE_lEiln amyS^lBglT V amyE__lbem =» 
5 amy2_EMkl5> amy2_Ba!3n amyE_5il?i fbrE_7fldlfli fbr2_7flelfl^ 
amy2321mS^ amyE^SiJbin amyE_lElf l^n tes3_lkb5^ amyE_liEMn 
amyE^ljl 1 ^ amy£_Ebnn amyE_7j5=i amyE_14b5^ amyE_Eol3^ fkdE_3kln 
melE_7gm=, mel2_lEjl ^ melE_7kl c U amyE_Sc£E=. f brE_?aiSl =i 
amyEJlnm amyE_lcl»EV amy£_li:L=, amy£_EfEE=, amyS_SglEn fbrS_7ficlE=. 

10 tes3_10ilfein tes3_31alOT amyE_ldhl7n amyE„lDp7; amyE_12d7^ 
amyE_Eflfln tes3_llc£E; tes3_lldEln tesS.ETfEMn tes3_31j£0; 
tes3_5k£E^ Tes3_10nl0^ Tes3_llel7i Tes3_lEdia Tes3_141?=i 
Tes3.15nl4; Tes3_JLbp3; Tes3_nplEn Tes3_Elk;m ; Tes3_E£ii:U 
Tes3_£ElEi4V tes3_2bg3n tes3_3Dpbn amyE_lldS^ amyE_lSlo!7 =i 

15 amyE^liim amy2_2 l *cai fbrE_7adm tes3_llal7n t es3_17iEl n 

tes3_SDhlE; tes3_7nlE^ tesB^elb* amyE^l^mlb^ tes3_lfinlMn their 
complements^ and variants thereof. 



E- An assemblage-i comprising at least one nucleic acid 
20 molecule having the sequence of a clone selected from the group 

consisting of: amyE_lEg7=i amyE_lEil; amyE„13gl c 1n amyE^ltelM; 

amyS_EMklS; amyE_Eal3n amyE_Eil7n amyE_lElmEi amy2_£4b4; 

amyE_lElf amyE_liEm amyE^ljl^n amyE^Ebl 1 ^ amyE^jSn 

amyE_mb5^ amyE^EolSn amyE_EcEE=; amy2_lln4; amyE_lcl2=. amyE_lil=i 
25 amyE_EfEEn amyE_Egl2=i amyE_ 10hl7^ amy2_:L0p7n amyE_lEd7n 

amyE^Eflfln amy2_lldEn amyE_121ol?V amyE_lil4i amyE_EMcfii their 

complements^ and variants thereof - 

3- An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
30 consisting of: f br2_7adlfl =i f brE_7aeiai fbrE_7aiEln fbrE_7aclE; 
fbrE_7ad4i their complements^ and variants thereof - 



M • An assemblage-i comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
35 consisting of: amyE_lElmEn amyE_ EMbHi their complements! and 
variants thereof . 
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5- An assemblage-i comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 

consisting of: amy2_121f IT } tes3_lbb5} their complements} and 

variants thereof - 

5 L>- An assemblage-i comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_li2M} amy2_ljn} amy2_2bl c i} amy2_7j5} their 
complements! and variants thereof. 

?- An assemblage-i comprising at least one nucleic acid 
10 molecule having the sequence of a clone selected from the group 
consisting of: amy2_lMb5} amy2_2o!3} fkd2_3kl} mel2_7gm} their 
complements^ and variants thereof - 

fl- An assemblage-! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
15 consisting of mel2_?gm} mel2_12jl } melS^kl^ their 
complements} and variants thereof- 

1 • An assemblage«i comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amy2_2c22} fbr2_/?ai21} their complements} and 
20 variants thereof - 

ID- An assemblage-i comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amy2_llnH} amy2_lil} amy2_2gl2} fbr2_7flcl2} 
tes3_ IDilb} tes3_31al0} their complements} and variants thereof. 

25 11- An assemblage-i comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_10hl7} amy2_lDp7} amy2_12d7} amy2_2flfl} 
tes3_llc22} tes3_lld21} tes3_2Tf2M} tes3_31j2D} tes3_5k22} their 
complements} and variants thereof- 

30 12- An assemblage^ comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: tes3_lbb5} tes3_10ilb} tes3_31alD} tes3_llc22} 
tes3_lld2i} tes3_2 c Jf2H} tes3_31j2Q} tes3_5k22} Tes3_10nlQ} 
Tes3_llel7} Tes3_12dlfi } Tes3_lMl?} Tes3_15nm} Tes3_lbp3} 
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Tesa^nplEn Tes3_Elkm^ Tes3_E5illn Tes3_22124=. tes3_2bg3=, 
tes3_3Dpb; tes3_JLlal7n tes3_17iEl=i tes3__2Dhl2 ^ tes3_7nl2n 
tesS^elkn their complements =• and variants thereof. 

5 13- An assemblage-i comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_lld2i amyE__121ol?^ amy2_lil4n amyE_2McfiT 
fbr2_7fld,4S tes3_llal7; tes3_Ji7iHH tes3_2Dhl2=, tes3_?nl2i 
tesS^elb^ their complements^ and variants thereof . 

10 m- An assemblage! comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_14mlLii tGs3Jfinl4i amy2_lcl2=i amy2_2f22=i 
their complements^ and variants thereof. 

15- A nucleic acid molecule comprising a nucleotide 
15 sequence of the clone fkd2_3kl- 

lb- A computer readable medium-i comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_12g7n amy2_12il=i 
amy2_13gn=i amyEJbemi amy2_EMkl5 ^ amy2_2al3n amy2_2i!7=; 

20 fbr2_7fldlfl=i f br2_7flelfi ; aroy2_121n>2i amy2_24b4^ amy2_121f M h 

tesS^lbbS^ amy2_li24 : i amy2_l jlTn amy2_Sbl1n amy2_7jS^ amy2_14b5^ 
amy2_2ol3^ fkd2_3ki; mel2_7gl4=i mel2_12jl ^ mel2_7knv amy2_2c22^ 
fbr2_7fii21/i amy2_i:Ln4 1 amy2_lcl2^ amy2_lil; amy2_2f22; amy2_2gI2V 
fbr2_76cl2=; tes3_10ilb=, tes3_31alDn amy2_JL0hl7n amy2J0p7=, 

25 amy2_12d7; amy2„2fl6n tes3_llc22^ tes3_lld21n tes3_2 c ?f2Mn 

tes3_31j20=i tes3_5k22V Tes3_10nlD =i Tes3_llel7; Tes3_12d!LB 1 
Tes3_1417Y Tes3JSnm*n Tes3_lbp3n Tes3_npl2 =i Tes3_21kl4 n 
Tes3_22ill=i Tes3_22124; tes3_2bg3; tes3_30pbn amy2_lld2; 
amy2_121ol7n amy2_lims amy2_24cfl=i f brB_76d4 \ tes3_llal?V 

30 tes3_J.7i2H tes3_2Dhl2; tes3_7nl»2=i tes3_^elfcn amy2„mmlb; 
tes3.1finl4i their complements^ and variants thereof- 

17- A computer readable mediumn comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
35 selected from the group consisting of: amy2_12g7 =i amy2_12iH 
amy2_13gl c U amy2_lfc,el4n amy2_2MklSi amyB^alS^ amy2_2il7=, 
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amyE_lElmEi amyE_E4b4=. amyE_lSlf M =» amyE_li24=; amyE_ljn=i 
amyE^EblTi amyE_7j5; amyE^MbS; amyE_Eol3; amyE^EcEEi amy5_llnM; 
amyE_JiClEi amyE^lil 5 . amyE_EfEEn amyE_EglE=i amyE_10hl7^ amyE^lDp?^ 
amyE_12d?T amyE_Eflfln amyE^ldSn amyE_l£lol7 =. amyE.JLiim 
5 amyE__EMcfin their complements 2 ; and variants thereof .' 

Ifi. A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: f brE_/?fidl&=i fbrE_7flelB=i 
fbr£_7fliEi; fbrE_J7ficl£i fbrE_7ad4=, their complements^ and 
10 variants thereof. 

11- A computer readable mediumn comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amyE_3jElmEi amyE_ E4b4; 
their complements^ and variants thereof - 

15 ED. A computer readable mediumi comprising in electronic 

form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amyE__ZLElf I^t tes3_lbb5i 
their complements 2 ! and variants thereof. 

El- A computer readable medium-* comprising in electronic 
20 form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amyE^liEMn amyE_ljn^ 
amyE_Ebl^i amyE_7j5i their complements! and \/ariants thereof. 

EE- A computer readable mediums comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
25 selected from the group consisting of: amy2_mb5n amyE_2ol3; 
fkdE_3kln melE_7gm; their complements^ and variants thereof. 

E3- A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: mel2_lEjl n melE_7kn=i 
30 their complements^ and variants thereof. 

EM. A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amyE__2cEEn fbrE_7fliEi; 
their complements! and variants thereof- 
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25- A computer readable mediumn comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_llnm amy2_lilT 
amyB_2gl2* fbr2_7acl2n tes3_10ilb; tes3__31al0; their complements; 
5 and variants thereof - 

2fc>. A computer readable medium-, comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2__10hl7; amy2_10p7 : i 
amy2_12d7n amy2_2flfln tes3_llc22n tes3_lld2i; tes3_2Tf2m 
10 tes3_31j20; tes3_5k22; their complements! and variants thereof- 

27. A computer readable medium-. comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: tes3_lbb5; tes3_10ilb; 
tes3_31al0; tes3_llc22 ; tes3_lld2l; tes3_2^f 24 =; tes3_31 j20n 
15 tes3_5k22; Tes3_10nlDn Tes3_llel7n Tes3_12dlfl =i Tes3_141?=i 
Tes3_15nl4i Tes3_lfap3; Tes3_l c 5pl2 # i Tes3_21kl4 ; Tes3_22ill, 
Tes3_22124V tes3_2bg3i tes3_30pb; tes3_llal7; tes3_17i2H 
tes'3_2Dhl2n tes3_7nl2=; tesS^Telb; their complements; and variants 
thereof. 

20 

2fi. A computer readable medium-i comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 

selected from the- group consisting of: amy2_lld2n :- amy2_121ol7; 

amy2_lim=i amy2_24cfl; fbr2_7fid4n tes3_llal7n tes3„17i2i; 
25 tes3_20hl2; tes3_7n!2; tes3_Telb; their complements 5 ; and variants 
thereof • 

2T- A computer readable medium-* comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_14mlb ! i tes3_lfinlMn 
30 amy2_lcl2T amy2_2f22; their complements; and variants thereof. 

30- A computer readable medium-i comprising in electronic 
form a nucleic acid or protein sequence of the clone fkd2_ 3kl. 



-610- 



